staeco / gtfs-stream

Streaming GTFS and GTFS-RT parser for node
MIT License
30 stars 6 forks source link

Number parsing causing issues #5

Closed linusnorton closed 5 years ago

linusnorton commented 5 years ago

Hi,

I'm currently still using version 1.x of this library as upgrading to the latest seems to come with a fairly hefty performance penalty. Using the gb rail data set version 1.x loads in about 24 seconds on my PC. With version 2.x it's about 35 seconds.

On top of that it actually causes an issue because I expect everything to be a string, it seems that it is now trying to parse every single field as a number. I'm not sure how the library you use for parseNumber behaves but parseInt certainly has some odd behaviour - for instance parseInt("123abc") will result in 123, which is not what we'd want in many cases.

I understand if you don't want to resolve these issues as it seems other people have different use cases. Let me know and I'll maintain a separate fork.

Cheers

yocontra commented 5 years ago

@linusnorton Can you post your code? The number parsing shouldn't add any performance penalty.

We can make the number parser more conservative + add a flag to turn it off.

yocontra commented 5 years ago

Updated ^ Parsing should not be an issue anymore, and if you still want to disable it { raw: true } will do it

linusnorton commented 5 years ago

That seems to result in an error:

TypeError: this.mapValues is not a function
    at mapValue (/home/linus/Work/planar/raptor/node_modules/csv-parser/index.js:152:19)
    at CsvParser._online (/home/linus/Work/planar/raptor/node_modules/csv-parser/index.js:170:17)
    at CsvParser._transform (/home/linus/Work/planar/raptor/node_modules/csv-parser/index.js:258:16)
    at CsvParser.Transform._read (_stream_transform.js:187:10)
    at CsvParser.Transform._write (_stream_transform.js:175:12)
    at doWrite (_stream_writable.js:415:12)
    at writeOrBuffer (_stream_writable.js:399:5)
    at CsvParser.Writable.write (_stream_writable.js:299:11)
    at DestroyableTransform.ondata (/home/linus/Work/planar/raptor/node_modules/readable-stream/lib/_stream_readable.js:619:20)
    at DestroyableTransform.emit (events.js:193:13)
    at DestroyableTransform.EventEmitter.emit (domain.js:481:20)

I'm not sure why, I thought setting an object key of undefined was the same as not setting it.

yocontra commented 5 years ago

@linusnorton Yeah, didn't write a test for that option - 2.0.8 should fix it.

linusnorton commented 5 years ago

That's got it. 18 seconds now - thanks