planarnetwork / dtd2mysql

MySQL / MariaDB import for DTD feeds (fares, timetable and routeing)
30 stars 10 forks source link

fix calendar generating start_date > end_date #78

Open miklcct opened 11 months ago

miklcct commented 11 months ago

This prevents any calendar entries with start_date > end_date from leaking out by preventing them from being returned at the first instance.

fixes #76

linusnorton commented 11 months ago

Thank you for taking the time to write a patch. I don't have the latest data to hand - do you know why the end date before the start date? Given it's on the associated schedule (W27693_W28273) and not the cloned journeys I think the suspect logic would be:

calendar.clone(
  moment.max(this.calendar.runsFrom, calendar.runsFrom),
  moment.min(this.calendar.runsTo, calendar.runsTo)
),

I can't remember why we would use the calendar from the assoc train service (calendar) rather than the association itself. Logically I would say that the new service will only run for the duration of the association.

Does changing that code in mergeSchedules to this.calendar fix the issue?

miklcct commented 11 months ago

The original clone function accidentally mutates the parameters passed in (hence I have added two lines of .clone()) causing an invalid schedule, which I spent hours of debugging as the bugged schedule suddenly appeared and it didn't exist at the generation source.

After I added the two lines of .clone(), the original bug disappeared but a few pairs of [date+1, date] invalid ranges appeared.

The following is the raw data:

AANW27693W282732311132312011111100VVSCRDFCEN  TP                               P
AANW27693W282732311212311240111100   CRDFCEN  T                                C
AANW27693W282732311272312011111100   CRDFCEN  T                                C
BSNW276932311132312011111100 POO2M04    125442000 DMUS   075      S            P
BSNW282732311132312011111100 POO2V02    125447000 DMUS   075      S            P
BSNW282732311142311240111100 POO2V02    125447000 DMUS   075      S            O
BSNW276932311212311240111100 POO2M04    125442000 DMUS   075      S            O
BSNW276932311272312011111100 POO2M04    125442000 DMUS   075      S            O
BSNW282732311272312011111100 POO2V02    125447000 DMUS   075      S            O
BSNW276932312042312081111100 POO2M04    125442000 DMUS   075      S            P
BSNW282732312042312081111100 POO2V02    125447000 DMUS   075      S            P

and the CSV exported from the database

"id","train_uid","runs_from","runs_to","monday","tuesday","wednesday","thursday","friday","saturday","sunday","bank_holiday_running","train_status","train_category","train_identity","headcode","course_indicator","profit_center","business_sector","power_type","timing_load","speed","operating_chars","train_class","sleepers","reservations","connect_indicator","catering_code","service_branding","stp_indicator"
"60145","W27693","2023-11-13","2023-12-01","1","1","1","1","1","0","0","1","P","OO","2M04",NULL,"1","25442000",NULL,"DMU","S","075",NULL,"S",NULL,NULL,NULL,NULL,NULL,"P"
"63140","W27693","2023-11-21","2023-11-24","0","1","1","1","1","0","0","1","P","OO","2M04",NULL,"1","25442000",NULL,"DMU","S","075",NULL,"S",NULL,NULL,NULL,NULL,NULL,"O"
"84553","W27693","2023-11-27","2023-12-01","1","1","1","1","1","0","0","1","P","OO","2M04",NULL,"1","25442000",NULL,"DMU","S","075",NULL,"S",NULL,NULL,NULL,NULL,NULL,"O"
"101522","W27693","2023-12-04","2023-12-08","1","1","1","1","1","0","0","1","P","OO","2M04",NULL,"1","25442000",NULL,"DMU","S","075",NULL,"S",NULL,NULL,NULL,NULL,NULL,"P"
"60146","W28273","2023-11-13","2023-12-01","1","1","1","1","1","0","0","1","P","OO","2V02",NULL,"1","25447000",NULL,"DMU","S","075",NULL,"S",NULL,NULL,NULL,NULL,NULL,"P"
"60162","W28273","2023-11-14","2023-11-24","0","1","1","1","1","0","0","1","P","OO","2V02",NULL,"1","25447000",NULL,"DMU","S","075",NULL,"S",NULL,NULL,NULL,NULL,NULL,"O"
"84582","W28273","2023-11-27","2023-12-01","1","1","1","1","1","0","0","1","P","OO","2V02",NULL,"1","25447000",NULL,"DMU","S","075",NULL,"S",NULL,NULL,NULL,NULL,NULL,"O"
"101523","W28273","2023-12-04","2023-12-08","1","1","1","1","1","0","0","1","P","OO","2V02",NULL,"1","25447000",NULL,"DMU","S","075",NULL,"S",NULL,NULL,NULL,NULL,NULL,"P"
"id","base_uid","assoc_uid","start_date","end_date","monday","tuesday","wednesday","thursday","friday","saturday","sunday","assoc_cat","assoc_date_ind","assoc_location","base_location_suffix","assoc_location_suffix","association_type","stp_indicator"
"428","W27693","W28273","2023-11-13","2023-12-01","1","1","1","1","1","0","0","VV","S","CRDFCEN",,,"P","P"
"441","W27693","W28273","2023-11-21","2023-11-24","0","1","1","1","1","0","0",NULL,NULL,"CRDFCEN",,,NULL,"C"
"694","W27693","W28273","2023-11-27","2023-12-01","1","1","1","1","1","0","0",NULL,NULL,"CRDFCEN",,,NULL,"C"
linusnorton commented 11 months ago

Thanks, I'm just wondering if it's possible to prevent the null returns in order to simplify the code

linusnorton commented 11 months ago

Matching up the associations and services with the overlays makes it look like the start date for W28273 is wrong:

AAN W27693 W28273 231113 231201 1111100VVSCRDFCEN  TP                               P
BSN W27693        231113 231201 1111100 POO2M04    125442000 DMUS   075      S            P
BSN W28273        231113 231201 1111100 POO2V02    125447000 DMUS   075      S            P

AAN W27693 W28273 231127 231201 1111100   CRDFCEN  T                                C
BSN W27693        231127 231201 1111100 POO2M04    125442000 DMUS   075      S            O
BSN W28273        231127 231201 1111100 POO2V02    125447000 DMUS   075      S            O

AAN W27693 W28273 231121 231124 0111100   CRDFCEN  T                                C
BSN W27693        231121 231124 0111100 POO2M04    125442000 DMUS   075      S            O
BSN W28273        **231114** 231124 0111100 POO2V02    125447000 DMUS   075      S            O

At that point the min/max should kick in and the max of the start date should use the AA record start date and the start and end should come out as 231124. That would suggest the merged record is fine but then we will create another service for before the association:

    if (assoc.calendar.runsFrom.isBefore(assocCalendar.runsFrom)) {
      const before = assoc.calendar.clone(assoc.calendar.runsFrom, assocCalendar.runsFrom.clone().subtract(1, "days"));

      schedules.push(assoc.clone(before, idGenerator.next().value));
    }

I regret the naming of assoc and assocCalendar. assoc.calendar is the the assoc service and assocCalendar is assoc record itself.

The new before service should run from 231114 to 231121 - 1 day so I'm not quite sure how it still goes wrong.

Sorry for thinking out loud. I haven't touched this code in years and the "correct" behaviour is not always obvious.

linusnorton commented 11 months ago

@miklcct I think they have corrected the data this morning so that the runs_from field for:

BSN W28273        **231114** 231124 0111100 POO2V02    125447000 DMUS   075      S            O

Now matches the association record.

I'm inclined to merge the fix for the mutation bug you found but not the changes that allow null schedules.

miklcct commented 11 months ago

If I only fix the mutation bug it generates a GTFS with a few calendar ranges between Christmas and New Year with start_date being 1 day after the end_date, so I believe returning null from clone() is necessary, or disable the calendar tightening altogether.

linusnorton commented 11 months ago

I did a quick check with today's data and I couldn't find any cases where the start date was after the end date. What records are you seeing?

miklcct commented 11 months ago

I have just run the generator on the latest master branch with version 930 data, with only the two lines of .clone() added to fix the mutation issue.

Errors are generated as follows: filename (?)The name of the faulty file. csvRowNumber (?)The row number of the faulty record. entityId (?)The faulty service id. startFieldName (?)The start value's field name. startValue (?)The start value. endFieldName (?)The end value's field name. endValue (?)The end value. "calendar.txt" 156 "155" "start_date" "20231203" "end_date" "20231126" "calendar.txt" 493 "492" "start_date" "20231208" "end_date" "20231207" "calendar.txt" 539 "538" "start_date" "20231127" "end_date" "20231124" "calendar.txt" 596 "595" "start_date" "20231204" "end_date" "20231201" "calendar.txt" 886 "885" "start_date" "20231130" "end_date" "20231129" "calendar.txt" 887 "886" "start_date" "20231130" "end_date" "20231124" "calendar.txt" 888 "887" "start_date" "20231129" "end_date" "20231128" "calendar.txt" 1037 "1036" "start_date" "20231206" "end_date" "20231129" "calendar.txt" 1038 "1037" "start_date" "20231205" "end_date" "20231128" "calendar.txt" 1090 "1089" "start_date" "20231209" "end_date" "20231202" "calendar.txt" 1131 "1130" "start_date" "20231202" "end_date" "20231125" "calendar.txt" 3230 "3229" "start_date" "20231214" "end_date" "20231213" "calendar.txt" 3520 "3519" "start_date" "20240105" "end_date" "20231229" "calendar.txt" 4531 "4530" "start_date" "20240126" "end_date" "20240119" "calendar.txt" 4769 "4768" "start_date" "20240127" "end_date" "20240120" "calendar.txt" 4801 "4800" "start_date" "20240120" "end_date" "20240113" "calendar.txt" 4802 "4801" "start_date" "20240106" "end_date" "20231230"

The full report can be viewed here.

linusnorton commented 11 months ago

Ah it looks like your on the weekly feed. I'm on the daily which is 932

linusnorton commented 11 months ago

RJTTC931.ZIP RJTTC932.ZIP

Try processing these after the 930

miklcct commented 11 months ago

Error still exist on version 932: https://gtfs-validator-results.mobilitydata.org/9f2b4ecb-81c5-4ac6-9ee4-974989ed4507/report.html

filename (?)The name of the faulty file. csvRowNumber (?)The row number of the faulty record. entityId (?)The faulty service id. startFieldName (?)The start value's field name. startValue (?)The start value. endFieldName (?)The end value's field name. endValue (?)The end value. "calendar.txt" 163 "162" "start_date" "20231203" "end_date" "20231126" "calendar.txt" 240 "239" "start_date" "20231207" "end_date" "20231206" "calendar.txt" 632 "631" "start_date" "20231127" "end_date" "20231124" "calendar.txt" 640 "639" "start_date" "20231205" "end_date" "20231204" "calendar.txt" 690 "689" "start_date" "20231204" "end_date" "20231201" "calendar.txt" 696 "695" "start_date" "20231208" "end_date" "20231201" "calendar.txt" 849 "848" "start_date" "20231208" "end_date" "20231207" "calendar.txt" 1015 "1014" "start_date" "20231204" "end_date" "20231130" "calendar.txt" 1034 "1033" "start_date" "20231130" "end_date" "20231129" "calendar.txt" 1035 "1034" "start_date" "20231130" "end_date" "20231124"

linusnorton commented 11 months ago

I get a completely different set of data gb-rail.zip

linusnorton commented 11 months ago

This is my 930: https://file.io/yK4ODhtZDv6b - is it the same as yours?

miklcct commented 11 months ago

Yes the files are the same. Have you added the two lines of .clone() in ScheduleCalendar.ts lines 113 and 114?

linusnorton commented 11 months ago

No, just running from master. I'll try again with the clone methods

linusnorton commented 11 months ago

That's so weird! Adding the clones, which I would expect to be fine then gives me the same results as you. I'll debug a bit further. Sorry this is taking so long.