peter-mount / nre-feeds

go library & Application for handling the NRE DarwinD3 feeds
Apache License 2.0
8 stars 0 forks source link

Calling points and schedule locations duplicated #22

Open peter-mount opened 4 years ago

peter-mount commented 4 years ago

This is an upstream issue in Darwin where it's possible for new entries to be put into a schedule during disruption.

I this case we have RID 201908018960827 which is the 1824 MAN - Buxton but terminates at Hazel Grove.

At Woodsmoor we show it calling at Hazel Grove twice due to 2 calling points: Screenshot_2019-08-01_17-49-52

To fix that we might be able to filter them out by looking at the forecast.date entries:

      {
        "type": "DT",
        "tiploc": "HAZL",
        "displaytime": "18:46:00",
        "timetable": {
          "time": "18:46:00",
          "pta": "18:46",
          "wta": "18:46:00"
        },
        "planned": {},
        "forecast": {
          "time": "18:46:00",
          "arr": {
            "et": "18:46:00",
            "etMin": "18:46:00",
            "src": "Tyrell"
          },
          "dep": null,
          "pass": null,
          "plat": {
            "plat": "1",
            "source": "M"
          },
          "date": "2019-08-01T17:24:04.4009615+01:00"
        },
        "length": 4,
        "delay": 0,
        "loading": null
      },
      {
        "type": "DT",
        "tiploc": "HAZL",
        "displaytime": "18:47:00",
        "timetable": {
          "time": "18:47:00",
          "pta": "18:46",
          "ptd": "18:47",
          "wta": "18:46:00",
          "wtd": "18:47:00"
        },
        "planned": {
          "activity": "TF",
          "plannedActivity": "T "
        },
        "forecast": {
          "time": "18:47:00",
          "arr": {
            "et": "18:46:00",
            "wet": "18:45:00",
            "src": "Darwin"
          },
          "dep": {
            "et": "18:47:00",
            "src": "Darwin"
          },
          "pass": null,
          "plat": {
            "plat": "1",
            "source": "M"
          },
          "date": "2019-08-01T03:23:56.022911+01:00"
        },
        "delay": 0,
        "loading": null
      },

here we should use the one with the most recent date if we have multiple entries for the same location.

However, we must be careful we don't break circular services as then it is valid to have 2 entries in the schedule.

The same goes in the output schedules, we could filter out the duplicates which will fix an age old issue during disruption but again we must be careful of circular services.

peter-mount commented 4 years ago

Related to this is 201908078783418 which, due to a bridge bash at West Malling is showing as cancelled throughout - except for 2 entries at MDE:

201908078783418

Now here, to me it looks like it should be running as the 2nd entry at MDE isn't cancelled and has a later update time.

Again, like above, there's 2 entries as the timetable has changed, so the apparently running entry only has a ptd whilst the cancelled entry is the original with both pta & ptd entries - making them distinct.

Right now I'm thinking this should go along the lines of:

This should then not break circular routes but could still cause an issue if another entry gets in between them - but it's better than showing a cancelled train when it isn't