Closed polettif closed 1 year ago
Merging #203 (d6924d2) into master (18cf9b4) will increase coverage by
0.00%
. The diff coverage is100.00%
.:exclamation: Current head d6924d2 differs from pull request most recent head 40719ac. Consider uploading reports for the commit 40719ac to get more accurate results
:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more
@@ Coverage Diff @@
## master #203 +/- ##
=======================================
Coverage 99.91% 99.91%
=======================================
Files 16 16
Lines 1119 1120 +1
=======================================
+ Hits 1118 1119 +1
Misses 1 1
Impacted Files | Coverage Δ | |
---|---|---|
R/validate_gtfs.R | 100.00% <100.00%> (ø) |
As it turns out, the implemented check for duplicated primary keys is very slow on larger feeds. This PR improves the runtime by about 10x (!) by using
data.table::anyDuplicated
. Example benchmarks:With the NYC dataset
With the Swiss dataset:
Maybe there's faster ways to check for duplicated keys but I can't think of a better way currently. Checking for unique keys in large datasets is bound to come with some cost.