planarnetwork / dtd2mysql

MySQL / MariaDB import for DTD feeds (fares, timetable and routeing)
30 stars 10 forks source link

Skipping stops with N ("stop not advertised") activity flag #23

Closed mk-fg closed 7 years ago

mk-fg commented 7 years ago

Mentioned this one earlier here: https://github.com/open-track/dtd2mysql/issues/14#issuecomment-325153034

gtfs-webcheck occasionally stumbles upon these stops as non-existing for corresponding trips in serw api data.

Checking data for all stops that have public time and "N" activity flag turns up schedules for ~200 train_uid's, full list for ttis653.zip: https://gist.github.com/mk-fg/5fd942d0ca0a880e06cbb239e7a7eae4

All but outliers at the very start/end are very similar and have start/end at HVW (HEATHROW TERM 5) marked with "N" for some dates. Some of these also have AML with "N" flag for some dates (e.g. P25417). Including these stops doesn't match trips in gtfs-webcheck, adding e.g. --test-skip-stops 1-0 to skip stops that are marked as such returns matching trip there.

Few remaining trips that can be checked in similar way (some are too far in the future), like L02139 (with starting BTN stop having that activity flag), also match in a similar fashion.

Discussion thread linked in the comment above also suggests that these stops are not actually public - from the initial comment (which gets confirmed) to Peter Hicks' suggestion of what these might mean:

Another examples of unadvertised stops are on the 0415 Northampton - London Euston at Queens Park (Main Line). My guess is the unadvertised call there is for staff purposes, maybe a 'grandfather rights' thing - but it's unadvertised because the train might not call there if it has to run on the fast lines.

https://groups.google.com/forum/#!topic/openraildata-talk/eBTiB1BxrRw

Which all seem to suggest to me that these should not be in GTFS data as public stops.

When implementing processing for these flags, you seem to have omitted this one, despite it being mentioned in the issue there, so wanted to bring it up again separately, in case it wasn't skipped intentionally.

Few strange outlier CIF schedules that have flag are kinda like this one: https://gist.github.com/mk-fg/371a73b484cedf9efda5fa365f2a3886#file-gistfile1-txt-L28-L32 Which - given how weird it looks compared to others for same train_uid - probably shouldn't make it into GTFS, and is maybe some kind of staff-only run, as Peter suggested.

mk-fg commented 7 years ago

Also, when checking these stops on dates with "N" activity, southeasternrailway api returns "bulletins", like this:

[cmCx] API [serw] data mismatch for gtfs trip: GWCTestFailNoJourney
[cmCx] Trip: <Trip L02139 [- -] [BTN - LWS - PLG - HMD - EBN]>
[cmCx] Date/time: 2017-10-17 01:21:00
[cmCx] Diff details:
[cmCx]   Found non-matching journeys:
[cmCx]     <Jn fxtccuUY [05:12 05:44] [L01647(05:12+5)]>
[cmCx]   Associated bulletins:
[cmCx]     [ { 'category': 'Disruption',
[cmCx]         'description': 'will be affected by engineering work',
[cmCx]         'id': 768688,
[cmCx]         'journeys': ['fxtccuUY'],
[cmCx]         'severity': 'P2',
[cmCx]         'title': 'It is not yet known how Southern',
[cmCx]         'url': ''}]

Or this:

[ilvh] API [serw] data mismatch for gtfs trip: GWCTestFailNoJourney
[ilvh] Trip: <Trip P25417 [- -] [AML - EAL - WEA - HAN - STL - HAY - HXX - HWV]>
[ilvh] Date/time: 2017-11-11 06:35:00
[ilvh] Diff details:
[ilvh]   Found non-matching journeys:
[ilvh]     <Jn f61TW-bY [06:51 07:41] [C23525(06:51+6) - P25421(07:13+8)]>
[ilvh]     <Jn f61LwS-Y [07:21 08:11] [C23527(07:21+6) - P25428(07:43+8)]>
[ilvh]     <Jn f61Lw9UY [07:51 08:41] [C23529(07:51+6) - P25432(08:14+8)]>
[ilvh]   Associated bulletins:
[ilvh]     [ { 'category': 'Disruption',
[ilvh]         'description': 'will be affected by engineering work',
[ilvh]         'id': 770785,
[ilvh]         'journeys': ['f61TW-bY', 'f61LwS-Y', 'f61Lw9UY'],
[ilvh]         'severity': 'P2',
[ilvh]         'title': 'It is not yet known how Heathrow Connect',
[ilvh]         'url': ''},
[ilvh]       { 'category': 'Disruption',
[ilvh]         'description': 'Please note only Heathrow Express and Heathrow Connect tickets can be '
[ilvh]                        'purchased at this station',
[ilvh]         'id': 742878,
[ilvh]         'journeys': ['f61TW-bY', 'f61LwS-Y', 'f61Lw9UY'],
[ilvh]         'severity': 'P2',
[ilvh]         'title': 'Purchasing tickets at Heathrow Airport',
[ilvh]         'url': ''},
[ilvh]       { 'category': 'Disruption',
[ilvh]         'description': 'will be affected by engineering work',
[ilvh]         'id': 772363,
[ilvh]         'journeys': ['f61TW-bY', 'f61LwS-Y', 'f61Lw9UY'],
[ilvh]         'severity': 'P2',
[ilvh]         'title': 'It is not yet known how Heathrow Connect',
[ilvh]         'url': ''}]

Suggesting that flag is some kind of "this stop is out of order" marker. Such bulletins are very rare otherwise - pretty sure only seen them a few times (2-5) out of hundreds of mismatches.

linusnorton commented 7 years ago

Weird, "It is not yet known how xxx", "will be affected by engineering work".

Sounds like I've misunderstood not advertised and they should be included but with some kind of warning.

linusnorton commented 7 years ago

Maybe using the GTFS "coordinated activity" would be a good idea. To at least suggest that people need to check something

mk-fg commented 7 years ago

Sounds like I've misunderstood not advertised and they should be included but with some kind of warning.

Going by the info above, I'd think these should be "not advertised" as in "should not be advertised as actual stops on any site", but it's your call, ofc.