mfdz / gtfs-hub

Collecting, shape-enhancing, validating, fixing and (partially) merging GTFS feeds
GNU Affero General Public License v3.0
29 stars 4 forks source link

Stops close to origin for VGC feed #11

Closed hbruch closed 3 years ago

hbruch commented 3 years ago

The VGC feed contains 5 stops with coordinates (0,0) (see e.g. https://gtfs.mfdz.de/gtfsvtor_VGC.html):

stops.txt
1837:de:08235:10441:0:1,"Nagold, Stadtbahnhof Mitte",0.000000,0.000000
2018:de:08235:4399:0:3,"Nagold, Rötenhöhehof",0.000000,0.000000
2377:gen:8235:10336:3,"Rotfelden, Kindergarten",0.000000,0.000000
2379:gen:8235:3040:0:3,Heumaden (Calw); Waldenser Str,0.000000,0.000000
2380:gen:8235:3045:0:3,Heumaden (Calw) ;Hagebuttenw.,0.000000,0.000000

As I did not find them in DELFI nor in the NVBW Haltestellen.csv: Are these actual stops with stop_times? If yes, they should be patched with roughly matching coordinates via a rule file (assuming we don't shortly switch to a delfi-based GTFS).

derhuerst commented 3 years ago
stops.txt
1837:de:08235:10441:0:1,"Nagold, Stadtbahnhof Mitte",0.000000,0.000000
2018:de:08235:4399:0:3,"Nagold, Rötenhöhehof",0.000000,0.000000
2377:gen:8235:10336:3,"Rotfelden, Kindergarten",0.000000,0.000000
2379:gen:8235:3040:0:3,Heumaden (Calw); Waldenser Str,0.000000,0.000000
2380:gen:8235:3045:0:3,Heumaden (Calw) ;Hagebuttenw.,0.000000,0.000000

AFAICT these stop_ids are not included in the current VGC feed. Could it be that they are a prefix from GTFSVTOR?

I get these results:

de:08235:10441:0:1,"Nagold, Stadtbahnhof Mitte",0.000000,0.000000
de:08235:4399:0:3,"Nagold, Rötenhöhehof",0.000000,0.000000
gen:8235:10336:3,"Rotfelden, Kindergarten",0.000000,0.000000
gen:8235:3040:0:3,Heumaden (Calw); Waldenser Str,0.000000,0.000000
gen:8235:3045:0:3,Heumaden (Calw) ;Hagebuttenw.,0.000000,0.000000

For these stops, there are ~15k arrivals/departures in the feed:

select
    stop_id,
    count(t_arrival) as nr_of_arrivals
from arrivals_departures
where True
and stop_id = ANY(ARRAY['de:08235:10441:0:1', 'de:08235:4399:0:3', 'gen:8235:10336:3', 'gen:8235:3040:0:3', 'gen:8235:3045:0:3'])
group by stop_id
"stop_id","nr_of_arrivals"
"gen:8235:3040:0:3",4857
"de:08235:10441:0:1",4826
"gen:8235:3045:0:3",4857
"de:08235:4399:0:3",228
"gen:8235:10336:3",456
derhuerst commented 3 years ago

There are no other stops with similar names, so we can't filter these faulty stops out, we have to correct their locations.

hbruch commented 3 years ago

Yes, the numbers before the stop_id where line numbers.

The correct coordinates (the IFOPT_IDs in vgc feed apparently are wrong/outdated, too) are: Nagold Stadtmitte, Bf;Gleis 1;de:08235:10446:90:1;8.728962732267801;48.550905897340634 Nagold, Rötenhöhehof;;de:08235:10139:0:3;8.72904310339791;48.56888243389193 Rotfelden, Kirche;;de:08235:10225;de:08235:10225:0:4;8.698210852009076;48.60646709648375

Wild guesses: Heumaden (Calw) Waldenser Str;gen:8235:3040:0:3;8.76361;48.71554 Heumaden (Calw) Hagenbuttenweg;8.75913;48.71714

derhuerst commented 3 years ago

What about "Rotfelden, Kindergarten"? Shall I remove it, or is "Rotfelden, Kirche" the proper name?

hbruch commented 3 years ago

I suppose they are the same, perhaps renamed. I did not find any current info. Hope NVBW/VGC will respond to https://github.com/mfdz/GTFS-Issues/issues/60.