Closed Rogdham closed 3 months ago
.peek()
might be doable. But reverse seeking scares me because of the hidden performance implications.
In order to implement reverse seeking, you effectively need to seek to start of file and then decompress until you get to the desired seek offset. Seek is intended to be a constant time operation. Seek in the presence of decompression is definitely not constant time. That's one of the reasons I didn't implement it.
IMO if someone wants to seek backwards, they can obtain a new file handle and seek forwards. This reinforces that backwards seeks are a performance footgun.
Or am I missing a use case necessitating backwards seeks?
One use case would be compressed ztail. i.e., grabbing just the last N number of lines of a file.
My simplest indexer for huge bioinformatic tables is based on pyzstd's SeekableZstdFile
. Will similar functionality be available in python-zstandard?
IMO if someone wants to seek backwards, they can obtain a new file handle and seek forwards. This reinforces that backwards seeks are a performance footgun.
No you are right, for backwards seeks we have no choice but to decompress again previous data (at least from the beginning of the closest frame). This is what pyzstd
was doing (and what I did in python-xz
also).
I agree that performance wise it is far from ideal, but from a usability perspective it's really useful. Like sometimes you just want to open some files from a .tar.zst
archive and not being able to seek prevents you from doing operations easily (e.g. getting list of files in the archive before decompressing then in a second pass, reading files out of order, etc.).
My take on the matter is that a disclaimer about performance in the documentation is the way to go about it.
Hello all, I have been in contact with Ma Lin, the author of the pyzstd
library.
The project has been fully transferred to me, and its new home is at https://github.com/Rogdham/pyzstd.
I have just released a new version shipping some (previously unreleased) changes from Ma Lin and updating the URLs.
As a result, this issue can be closed because pyzstd
library is not dead anymore :tada:
Hello, as you may know, the author of the
pyzstd
library has deleted their GitHub profile (btw the doc may need updating as a result).Users of that library will probably fallback to
python-zstandard
as a result. It may be worth it to help them in the migration, for example in listing the main usages ofpyzstd
and how to migrate topython-zstandard
for each of them.The main pain point I have identified is that
pyzstd
provides aZstdFile
class, for which migration is not straightforward.Maybe this could be ported to
python-zstandard
though. What do you think?