turicas / rows

A common, beautiful interface to tabular data, no matter the format
GNU Lesser General Public License v3.0
869 stars 134 forks source link

Add support for reading archives #236

Open turicas opened 7 years ago

turicas commented 7 years ago

In #108 we're going to add support for compressed files and then we also need to support archive files, such as .tar, .zip, .rar.

turicas commented 6 years ago

We can use magic numbers to check file contents:

In [14]: open('projects/eleicoes-brasil/data/download/candidatura-2018.zip', mode='rb').read(4).startswith(b'PK')
Out[14]: True

In [15]: open('Downloads/prestacao-contas-2002.zip', mode='rb').read(4).startswith(b'Rar!')
Out[15]: True

For RAR archives we can use rarfile library (needs external tool to extract files, on Debian systems: apt install libarchive-tools ou apt install unrar - the last is not free/libre).