queryverse / TextParse.jl

A bunch of fast text parsing tools
Other
58 stars 20 forks source link

Reading the first N rows #61

Open iscalprog opened 6 years ago

iscalprog commented 6 years ago

Sometimes it is useful to read just the first N rows, before reading everything. Is there a recommend way to achieve this?

davidanthoff commented 6 years ago

I actually would have thought that the nrows argument would control that, but apparently it doesn't.

Having the ability to only return n rows would be really, really useful. I think the API for that should just be that nrows controls that.

jpsamaroo commented 4 years ago

I'd like this feature to support https://github.com/JuliaComputing/JuliaDB.jl/pull/288, where we want to parse only the header, and then pass the header to workers who will then parse the rest of the file in chunks.

davidanthoff commented 4 years ago

https://github.com/queryverse/TextParse.jl/pull/145 was the first step: it freed the name nrows for this. Next step is to add the actual functionality.

I was actually also thinking to breaking out a separate function that just gets you the header, so that one doesn't have to use nrows=0 to do that. I also have a use case for that (a purely streaming mode that doesn't allocate any result vectors inside TextParse.jl). Just wondering whether that would also be useful for what you are working on?

jpsamaroo commented 4 years ago

The separate header-parsing function would be super useful, and would allow me to finish the current PR I'm working on!

A streaming interface would be interesting, although I'm not yet sure what I'd use it for :smile:

davidanthoff commented 4 years ago

Ok, cool. No promise on timing, but I'll try to give it a bit more priority than it has right now ;)