Closed pablodegrande closed 4 years ago
No, it is as you say, but it increases the time and there is no general agreement on what the compression should be, you like gzip, but some others prefer bzip2 and so on.
The other reason is that the idea behind the writing is that you are going to do a quick and dirty data exchange with R and therefore the files will be destroyed after use and therefore the size is not very relevant. If needed to store the files then it sounds like a very bad idea to do it as rdata or rds ... Better use arrow.
Great! I will use your library into this project https://github.com/poblaciones/poblaciones (which renders a collaborative data oriented map https://poblaciones.org). Users will be ok downloading an rdata file, and will gzip-it for them before retrieval... Thanks a lot!!
Yeah I see, for your case you need it compressed. Maybe I add it as an option in the future (default will be no compression so that it doesn't break existing code).
Just as a piece of advice, the interoperability of R files is terrible. Only R can read and write it correctly, because the format is undocumented and changes all the time. For that reason it would be better to provide files in an interoperable, documented format. But of course if you have a lot of R users they won't like it (and if you have users from other systems they won't like R formats)
OK, gzip compression is implemented as an option in pyreadr 0.3.2:
pyreadr.write_rdata("test.RData", df, df_name="dataset", compress="gzip")
Now I also remembered that the reason why this was not implemented before was partially because not high priority as explained before, but also because there was a bug on Windows that did not allow to delete the created files (https://github.com/Roche/pyreadstat/issues/49), that was blocking this.
Hope it helps
Nice! Thanks!
Pablo De Grande - IDICSO (USAL) / CONICET http://www.aacademica.org/pablo.de.grande http://www.aacademica.org/pablo.de.grande
On Tue, Sep 1, 2020 at 7:46 AM Otto Fajardo notifications@github.com wrote:
OK, gzip compression is implemented as an option:
pyreadr.write_rdata("test.RData", df, df_name="dataset", compress="gzip")
Now I also remembered that the reason why this was not implemented before was partially because not high priority as explained before, but also because there was a bug on Windows that did not allow to delete the created files, that was blocking this.
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/ofajardo/pyreadr/issues/41#issuecomment-684762107, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACYIIEFVFUAPM6QTZKXVPKDSDTGIPANCNFSM4QQR7O6A .
I investigated a few, and I believe that creating compressed rdata files in no more that calling:
import sys import gzip import shutil
with open('uncompressedfile.rdata', 'rb') as f_in: with gzip.open('compressedfile.gz.rdata', 'wb') as f_out: shutil.copyfileobj(f_in, f_out)
I was wondering why your library wouldn't do that while saving... Is any other format issue I am not aware of?
Thanks a lot, Pablo.