ipeaGIT / gtfs2gps

Convert GTFS data into a data.table with GPS-like records in R
https://ipeagit.github.io/gtfs2gps/
Other
71 stars 10 forks source link

'speed' values are NA , but not really #260

Open rafapereirabr opened 2 years ago

rafapereirabr commented 2 years ago

I'm finding a strange behavior in gtfs2gps(). In the reprex below, I filter a single trip and convert it to a GPS-like table. The problem is that the gtfs2gps() function prints a message saying 'speed' values are NA for shape_id '60-1-b12-1.1.O'. This message seems to suggest all speed values in this trip are NA, but they are not. This seems to be a problem in the code. The message should not be printed in this case, right?

The function is able to calculate the speed correctly, as seen in the outupt. There is only one NA in the last segment (as expected). So the function also prints the message Some 'speed' values are NA in the returned data.. As a rule, there will always be one NA in the last trip segment, right? So perhaps this message is unecessary. What do you guys think?

reprex

library(gtfs2gps)
library(ggplot2)
library(gtfstools)
library(data.table)

# path to GTFS.zip file
gtfs_file <- system.file("extdata/irl_dub/irl_dub_gtfs.zip", package = "gtfs2emis")

# read GTFS
gtfs <- gtfstools::read_gtfs(gtfs_file)

# Keep Monday services GTFS
gtfs <- gtfstools::filter_by_weekday(gtfs, 
                                     weekday = c('saturday', 'sunday'), 
                                     keep = FALSE)
# filter trip
id <- '6343.2.60-1-b12-1.1.O'
gtfs <- gtfstools::filter_by_trip_id(gtfs, trip_id =  id )

head(gtfs$trips)
head(gtfs$stop_times)
head(gtfs$stops)
head(gtfs$shapes)

# convert to gps
gps <- gtfs2gps(gtfs)
gps_sf <- gtfs2gps::gps_as_sflinestring(gps)

# plot
stops_df <- gtfstools::convert_stops_to_sf(gtfs)

ggplot() +
  geom_sf(data=gps_sf, aes(color=as.numeric(speed))) +
  geom_sf(data=stops_df, color='red') 
Joaobazzo commented 2 years ago

Behavior of function

The gtfs2gps function creates new coordinates of stop_sequences, and adds extra points in order to match with the last coordinate of gtfs$shapes. In this print blow, you can see that the last coordinate of gtfs$shapes do not match with the last coordinate of gtfs$stops.

Running the example you mentioned

Screenshot from 2022-07-12 11-47-38 In the map, you can see that is very very close, but not the same. Screenshot from 2022-07-12 11-43-36

Problems

However, even if tail(gps,1) == tail(gtfs$shapes,1), there will be NAs added — because of this behavior of gtfs2gps::gtfs2gps that creates new points for the stops. In this reprex below, I changed the last stop_id coordinates in order to match the last line of gtfs$shapes. The results are similar, because the stop_sequences coordinates are no longer the same as in gtfs$stop_id.

REPREX

gtfs_file <- system.file("extdata/irl_dub/irl_dub_gtfs.zip", package = "gtfs2emis")

# read GTFS
gtfs <- gtfstools::read_gtfs(gtfs_file)

# Keep Monday services GTFS
gtfs <- gtfstools::filter_by_weekday(gtfs, 
                                     weekday = c('saturday', 'sunday'), 
                                     keep = FALSE)
# filter trip
id <- '6343.2.60-1-b12-1.1.O'
gtfs <- gtfstools::filter_by_trip_id(gtfs, trip_id =  id )
# last stop equal to last shapes (lat,long)
gtfs$stops[.N,stop_lat := gtfs$shapes[.N,shape_pt_lat]]
gtfs$stops[.N,stop_lon := gtfs$shapes[.N,shape_pt_lon]]
# convert to gps
gps <- gtfs2gps(gtfs)
tail(gps)
gps_sf <- gtfs2gps::gps_as_sflinestring(gps)

Screenshot from 2022-07-12 12-08-22

Approaches

I can think of few strategies to solve this problem: 1) Not replacing the input stop_ids coordinates: by doing this we will no longer have this last NA (if, and only if, tail(gps,1) == tail(gtfs$shapes,1)) 2) Use some sort of tolerance in gtfs2gps: For instance, if the snapped point is within a certain distance (say 5 meters) of the input coordinates, we will not change the input value.

However, I don't know exactly how difficult would be to implement such solutions.

pedro-andrade-inpe commented 2 years ago

Maybe we could only improve the message. Currently it says that:

paste0(na_values, " 'speed' values are NA for shape_id '", shapeid, "'.")

Possibly we could also say that such values are (i) in the beginning, (ii) in the end, or (iii) in different parts of the shape. We could also remove the message Some 'speed' values are NA in the returned data. as the previous messages are more informative.

rafapereirabr commented 2 years ago

Perhaps this message should only be printed if there are two or more NAs in the output of the shape.