joereynolds / what-to-code

Ideas for things to program
MIT License
1.44k stars 63 forks source link

file splitting #11

Closed garrettm closed 7 years ago

garrettm commented 7 years ago

A program that can split one file into many. For example, I had a csv file that was 300mb and could not do any processing with it because it was so big. All the other alternatives that I checked out were either slow or not very user friendly. It should work on any file type, not just csv's. There is a good one here but I feel it could be made faster in C or Rust.

I made one in Rust, though I am a Rust novice. I don't know if this will be faster than the one you linked, but it does appear that my solution differs a bit (mine is split-into-N-chunks rather than split-into-chunks-of-N-lines). Also, probably worth noting that mine does nothing csv-specific, just raw line-wise splitting.

https://github.com/garrettm/splitters

It was a good learning experience! Thanks for this list of ideas.

garrettm commented 7 years ago

Okay, I just tested the ruby script, the rust version is way faster. But it doesn't have the same csv-specific properties that the ruby script does.

joereynolds commented 7 years ago

Hey, thanks for doing this! If you like, you can open a PR that adds this to the readme?

Cheers

garrettm commented 7 years ago

done!

joereynolds commented 7 years ago

https://github.com/joereynolds/what-to-code/commit/1ff14bf7070c9de797c87e32a7b82cf6ff406f9a