r-transit / gtfsio

Read and Write General Transit Feed Specification (GTFS)
https://r-transit.github.io/gtfsio/
Other
13 stars 3 forks source link

Prevent large round numbers to be saved in scientific notation #34

Closed dhersz closed 8 months ago

dhersz commented 8 months ago

An issue has been filed in {gtfstools} noting that round shape_dist_traveled entries were saved in scientific notation: https://github.com/ipeaGIT/gtfstools/issues/73

Note that saving in scientific notation does not prevent the entries from being correctly read with import_gtfs() (i.e. they are still read as numbers and it doesn't seem to affect their precision), but the issue creator notes that it does affect his workflow, as pfaedle can't handle the format.

Here's a reproducible example:

mock_shapes <- data.frame(
  shape_id = c("a", "b", "c"),
  shape_pt_sequence = 1:3,
  shape_pt_lat = 40:42,
  shape_pt_lon = 40:42,
  shape_dist_traveled = c(1, 10000000, 10000001)
)

tmpdir <- tempfile()
dir.create(tmpdir)
shapes_path <- file.path(tmpdir, "shapes.txt")
data.table::fwrite(mock_shapes, shapes_path, scipen = 999)
zip_path <- zip::zipr(tempfile(fileext = ".zip"), shapes_path)

readLines(shapes_path)
#> [1] "shape_id,shape_pt_sequence,shape_pt_lat,shape_pt_lon,shape_dist_traveled"
#> [2] "a,1,40,40,1"                                                             
#> [3] "b,2,41,41,10000000"                                                      
#> [4] "c,3,42,42,10000001"

gtfs <- gtfsio::import_gtfs(zip_path)
gtfs$shapes
#>    shape_id shape_pt_sequence shape_pt_lat shape_pt_lon shape_dist_traveled
#> 1:        a                 1           40           40               1e+00
#> 2:        b                 2           41           41               1e+07
#> 3:        c                 3           42           42               1e+07

exported_gtfs_dir <- tempfile()
gtfsio::export_gtfs(gtfs, exported_gtfs_dir, as_dir = TRUE)

readLines(file.path(exported_gtfs_dir, "shapes.txt"))
#> [1] "shape_id,shape_pt_sequence,shape_pt_lat,shape_pt_lon,shape_dist_traveled"
#> [2] "a,1,40,40,1"                                                             
#> [3] "b,2,41,41,1e+07"                                                         
#> [4] "c,3,42,42,10000001"

new_zip_path <- zip::zipr(
  tempfile(fileext = ".zip"),
  file.path(exported_gtfs_dir, "shapes.txt")
)
reimported_gtfs <- gtfsio::import_gtfs(new_zip_path)

reimported_gtfs$shapes$shape_dist_traveled
#> [1] 1e+00 1e+07 1e+07
format(reimported_gtfs$shapes$shape_dist_traveled, scientific = FALSE)
#> [1] "       1" "10000000" "10000001"

An easy solution to this problem seems to be adding scipen = 999 to the data.table::fwrite() call in export_gtfs().

dhersz commented 8 months ago

Done in https://github.com/r-transit/gtfsio/commit/c84887566dbd5d845373f5844d7ac820885129c7.