xiaodaigh / JDF.jl

Julia DataFrames serialization format
MIT License
90 stars 9 forks source link

Combining files in the folder into one file? #41

Closed y1my1 closed 4 years ago

y1my1 commented 4 years ago

Hi,

Thanks for writing this package. I have just tried using this package. It's pretty efficient both writing and reading. Just one question that maybe not so important. Right now it stores all the columns into a folder with each file corresponds to one column. Will you be considering combine all files into one file, say, xxx.jdf is just one binary file. This is basically for storage reasons. Forgive me if this sounds stupid. Thanks.

xiaodaigh commented 4 years ago

Thank you for raising the issue.

Forgive me if this sounds stupid.

There is nothing to forgive AND it doesn't sound stupid.

I think other formats like Parquet and Jay are single file. In fact, Jay was designed to move away from a multi-file approach. I would investigate this in the next release.

Currently, I am doing a parquet reader/writer, disk.frame update for dplyr, JLBoost update, and then I can come to this. This is my open-source priority list atm.

BTW if you can help me secure some funding, e.g from your company then I might be able to prioritise this. See https://github.com/sponsors/xiaodaigh for funding options :)

y1my1 commented 4 years ago

Thanks for all your work on these projects. I hope I could help in some ways but I am only a graduate student now. My request is of very low importance. So please just stick to your priority list.

Best regards,