thammegowda / mtdata

A tool that locates, downloads, and extracts machine translation corpora
https://pypi.org/project/mtdata/
Apache License 2.0
147 stars 22 forks source link

Reading from tarfiles without extracting them is slow #78

Closed thammegowda closed 3 years ago

thammegowda commented 3 years ago

Tar files are decompressed three times for each access

unlike zip, (compressed) tarfiles aren't suitable for random access