openaustralia / morph

Take the hassle out of web scraping
https://morph.io
GNU Affero General Public License v3.0
463 stars 74 forks source link

Missing libraries #1035

Open psychemedia opened 8 years ago

psychemedia commented 8 years ago

Over the last few days, I've been getting errors about missing libraries that are stopping my python scrapers from running.

Traceback (most recent call last):
  File "scraper.py", line 1, in <module>
    import scraperwiki
ImportError: No module named scraperwiki

Has the default setup changed?

wfdd commented 8 years ago

The Python buildpack should probably raise if it fails at installing the dependencies. A number of Python images had to be manually (?) invalidated earlier today 'cause of this, following an unrelated failure.

html5lib has introduced a requirement for setuptools >= 18.5 (html5lib/html5lib-python#263) but Morph is using an earlier version. If you don't wanna wait for the Morph folk to upgrade setuptools, you could theoretically invoke pip from inside your scraper:

try:
    import html5lib
except ImportEror:
    import subprocess
    subprocess.call(('pip', 'install', '--upgrade', 'setuptools'))
    subprocess.call(('pip', 'install', '--requirement', 'requirements.txt'))
psychemedia commented 8 years ago

Yes - I did a quick test installing libraries using pip, which works - but as a convenience I am using pandas, which means a whole raft of install dependencies, which is a real faff?

wfdd commented 8 years ago

You could swap html5lib with lxml; either works with pandas. I don't think you need to worry about the number of pandas' dependencies - this is a bit of an odd case.

mlandauer commented 8 years ago

@wfdd @psychemedia The problem with the python compiles not passing on failures to morph is unfortunately a know issue for a long time https://github.com/openaustralia/morph/issues/589.

As far as I know it only happens with Python and not with any of the other languages. It would be great to figure out what's happening there as it does cause a bunch of confusion and has caught me out a few times.

I suspect that it might just be a case of updating the python buildpack stuff.

wfdd commented 8 years ago

I suspect that it might just be a case of updating the python buildpack stuff.

Yep, see https://github.com/heroku/heroku-buildpack-python/commits/master/bin/steps/pip-install.