ploomber / projects

Sample projects using Ploomber.
Apache License 2.0
82 stars 25 forks source link

PatoolError when running the ETL example #35

Open neelasha23 opened 2 years ago

neelasha23 commented 2 years ago

On running this pipeline: https://github.com/ploomber/projects/tree/master/templates/etl , got the following error:

PatoolError: patool can not unpack
patool error: error extracting ../ploomber/templates/etl/output/data.7z: could not find an executable program to extract format 7z; candidates are (7z,7za,7zr),

Fixed it by replacing extractall in preprocess/download.py by :

shutil.register_unpack_format('7zip', ['.7z'], unpack_7zarchive)
shutil.unpack_archive(product['zipped'], product['extracted'])
idomic commented 2 years ago

great catch! Want to submit a PR? Make sure we're avoiding adding a new package dependency since the graph is already quite heavy

neelasha23 commented 2 years ago

Sure! I can take a look if there's some other way. The above change required a dependency of py7zr

edublancas commented 2 years ago

hi, thanks for reporting this! This is an issue with conda, sometimes it fails to find the appropriate package. If you can find a simple way to replace the package for something that's easier to install that'd be great. Alternatively, re-writing the example might be better.

When I wrote the initial example, I made the mistake of using a dataset that has this weird 7z compression. But we could really use any dataset, as long as the example stays the same: download data from the internet, upload it to a db, process it with SQL, and then have some python for visualization.