Open hleumas opened 1 year ago
sink_ipc
doesn't even have the option to set compression to uncompressed
. I wonder what's the reason?
I hope I can sink_ipc
uncompressed so that I can later scan_ipc
mempry-mapped.
I also struggle to sink IPC uncompressed, for later mmap use. I have large amount of data, not fit for RAM.
The only option seems to lazy_df.collect().write_ipc()
. But my data is too large.. This undermines whole concept of Lazy API.
Checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of Polars.
Reproducible example
Log output
Issue description
Reading documentation, one learns that:
write_ipc
isuncompressed
sink_ipc
iszstd
scan_ipc
assumes it's reading uncompressed ipc fileuncompressed
isn't among allowed compression algorithmsThis leads to super inconsistent behaviour where stuff suddenly breaks when one replaces
write_ipc
with its lazy version. Moreover, the fact that default behaviourscan_ipc
isn't compatible with the default behaviour ofsink_ipc
is confusing as well.Expected behavior
At minimum,
sink_ipc
followed byscan_ipc
should not emit warnings. This can be achieved either by disabling default memory mapping inscan_ipc
or by changing default compression touncompressed
.Ideally, sync and lazy versions should follow the same defaults.
Installed versions