kaneplusplus / bigmemory

126 stars 24 forks source link

read from gzfile #70

Open mvaudel opened 7 years ago

mvaudel commented 7 years ago

Hi,

Thank you for this useful package. I use to read my matrices from text files using read.big.matrix. I was wondering whether it would be possible to support input from gzfiles?

Best regards,

Marc

cdeterman commented 7 years ago

@mvaudel I can imagine a quick and dirty solution which would involve just uncompressing the file using gunzip and then reading the resulting file in as a big.matrix. I'm not sure otherwise about any R interface reading directly from gzfiles. If such an interface exists, then we could certainly explore it otherwise I think we will likely refer users to simply uncompress the file themselves (assuming other authors feel the same).

mvaudel commented 7 years ago

Thank you for your answer. It would be really convenient to read directly from the gzipped files because our files are quite huge so it is a substantial gain of time and space if we can read directly from them and deflate on the fly. Are you working on the files themselves or using a connection? For the latter if you can let us provide the connection directly instead of the file name, that should do the trick (https://stat.ethz.ch/R-manual/R-devel/library/base/html/connections.html).

privefl commented 7 years ago

Hey, check this function. You can found a vignette with more information. This may not be super fast, but it is quite flexible. Check all the arguments you need to specify, especially the file.nline that you have to know explicitly, because the function can't compute it on a compressed file.

jarbet commented 1 year ago

Any updates on this? I am trying to read a large .txt.gz file that contains character/string data. I know fread can read .txt.gz files, but the file is larger than my available RAM. I can't use bigstatsr::big_read because it does not support character type data.

Would it be possible to combine read.big.matrix with fread in some way, to support reading .gz files?

privefl commented 1 year ago

Maybe this?

jarbet commented 1 year ago

Maybe this?

Cool, I see they have a workaround for reading .gz files, so this should work. Thanks!