monterail / zip-codes

Identify city and state for given zip code
MIT License
109 stars 57 forks source link

Load data with Marshal instead of YAML #17

Open markprzepiora opened 5 years ago

markprzepiora commented 5 years ago

Loading 4MB of YAML data takes about 1 second on my development machine. Replacing the YAML-dumped data with Marshal-dumped data instead brings this down to about 0.08 seconds. This makes a big difference in feedback speed especially when running individual tests that rely on ZipCode.identify.

                      user     system      total        real
YAML.load         0.859559   0.029983   0.889542 (  0.889713)
Marshal.load      0.043685   0.030232   0.073917 (  0.073922)
Hengjie commented 5 years ago

This looks awesome. Thank you for writing it so that it can be sped up. I just wish the gem author merged it now.

brodyhoskins commented 1 year ago

@markprzepiora, I'd like to explore options for storing the data.

What I ended up doing in a fork was converting the YAML to CSV and using the FastCSV gem to process the data more quickly and to prevent loading it into memory all at once; however I don't feel that it was necessarily the best method.

Another idea is to keep the YAML around for development purposes but bundle a SQLite database generated by a rake task.

Since you've opened this PR I wanted to get your feedback.

lostapathy commented 6 months ago

Library data like this really shouldn't be distributed via Marshal - the format of Marshal is not guaranteed to be stable between ruby versions (and indeed, is not in practice). I'd suggest either pursuing fast CSV options or just querying a sqlite file.