usc-isi-i2 / table-linker

Table Linker
MIT License
21 stars 8 forks source link

feature request: adding compress flag to compress auxiliary files / candidate files #97

Open binh-vu opened 3 years ago

binh-vu commented 3 years ago

Feature request: adding compress flag to compress auxiliary files / candidate files

Currently, running table linker over thousands of files will generate huge amount of data (e.g., ~80GB / 1000 tables in my case). However, it's possible (may be often) that users have a dataset of ten thousands of tables that they need to link, which users may not have enough disks to store the results.

One possible solution is to add a compress flag (--compress) to table linker command indicating that the input/output should be compressed. To support this feature, if the compression is enabled, we only need change from open to gzip.open, and everything else stays the same.