Closed mlandauer closed 9 years ago
I'm actually still having a problem with submodules now that I'm no longer in experimental buildpack mode.
Thought it might be sort of a caching issue, so I made a trivial change to bump the git revision number. No luck; still no submodules.
Yeah this isn't right - we use submodules on PlanningAlerts scrapers.
They don't store stuff in subdirectories though so maybe try flattening your submodule repository.
Can you please point me to a scraper that uses submodules and works correctly? (There are lots of PlanningAlerts scrapers.)
https://github.com/planningalerts-scrapers/uralla
They all run with buildpacks too.
Oh duh, that builds a Rubygem and doesn't use submodules. Ignore me!
I think it might be the symlink in your repo. Why not just put the submodule there instead of the symlink?
Mostly because the subdir makes it easier to pack up the submodule as a package. Worth a try though; doing this now.
Actually, I think the problem is that I was using the git@...
URL for my submodule rather than the https://
one.
Now my scraper appears stalled, but that might have to do with complications related to changing the submodule URL (you don't do git submodule sync
). Will try deleting the submodule and re-adding it.
Oh, nah, seems to be an operational issue.
Oh well, going to bed (in California). Hopefully this will be working in the morning, and then maybe we can re-enable buildpacks? Thanks for all your help.
I'm @davidmarin, by the way (I created a separate account because I balked at giving morph.io write permission to all my repos).
Oh, nah, seems to be an operational issue.
:(
Oh well, going to bed (in California). Hopefully this will be working in the morning, and then maybe we can re-enable buildpacks? Thanks for all your help.
Just ping me if you want buildpacks enabled.
I'm @davidmarin, by the way (I created a separate account because I balked at giving morph.io write permission to all my repos).
Nice to meet you David :)
Okay, @henare, my campaigns and companies scrapers are now working again! I ultimately had to delete and re-create the scrapers; changing the URL of a git submodule isn't something that morph.io could handle (maybe running git submodule sync
would help?).
Thanks for all your help! Would love it if you could re-enable buildpacks. :)
Thanks for all your help! Would love it if you could re-enable buildpacks. :)
Done!
Sorry, still having problems that I don't have the access to figure out on my own. See #458.
Looks like submodules indeed are a problem with buildpacks. I create a test scraper with a trivial scraper; it ran fine on its own, but when I added submodules (https://github.com/spendright-scrapers/test/tree/60d71b4910f337e0930b767a1313c93485b8dca5), it stalled forever.
Also, deleting the submodule (in git) doesn't fix the problem; you have to delete the scraper and start over.
Doesn't matter if the submodule is inside a subdirectory (submodules/
) or in the top-level directory.
Sorry I haven't been able to help @spendright-scrapers, we're about to launch a new project so we'll all flat out.
Let me know if you want me to disable buildpacks for you.
That's okay, it looks like you can work around the submodule issue by using a git URL in requirements.txt
; you just have to make your submodule work as a package (i.e. add setup.py
).
See https://github.com/spendright-scrapers/test/tree/d20a2b6543d02d78d616ac34f6505070cd54b545 for an example.
Also confirmed that I can use runtime.txt
to use Python 3.4.1 (though dumptruck
isn't compatible with Python 3).
Actually, @henare, yes, please disable experimental buildpack support. My workaround doesn't seem to work in practice (tried it on my companies and it still hangs) and without console support (#427), I can't even tell what's going wrong.
I'll make a note to check back in a couple months to see how buildpacks are doing. Would be nice to be able to write this in Python 3, but at the moment it's not practical.
Thanks!
Done.
Thank you!
@henare, any progress on this? SpendRight is using Python 3 internally, and it would be nice to update our scrapers as well.
@davidmarin this issue is not something I'm actively working on, sorry.
No worries! Thanks for your honesty.
@davidmarin I've been doing some major work on the buildpack support recently including fixing a long-standing permissions issue which was probably at the root of some of the problems you experienced before when you switched to buildpacks.
This is all with the aim of getting buildpacks to the stage where we can switch everyone over to it seamlessly.
I'd be very keen on you trying buildpacks again with the incentive for you being that you can use Python 3, if you're up for it?
Let me know if you'd like me to reenable buildpacks for @spendright-scrapers
I'm going to close this issue. Please reopen it, or create a new issue if you experience this problem again
@mlandauer, sure, let's give it a try! Would love to get my entire codebase on Python 3 eventually (though that will mean porting dumptruck
as well).
Also, I've been wondering how you set up scrapers for an organization (e.g. openaustralia
). Ideally, I'd like the main repo for these scrapers to be under the spendright
organization on GitHub.
@spendright-scrapers I've enabled buildpacks on https://morph.io/spendright-scrapers so now all scrapers running under https://morph.io/spendright-scrapers will use buildpacks automatically.
To set up an organization, go to github and set up an organization there, make yourself a public member of that organization.
The only hassle right now is for morph.io to pick that up you might need to log out and log back in on morph. Then, you should see your membership of that organization on your user page on morph https://morph.io/spendright-scrapers.
Then, you can create scrapers under your organization in exactly the same way. Anyone that you make a public member of the organization will be able to do the same.
Hope that helps.
@mlandauer Thanks!
Unfortunately, it looks like my scrapers still totally don't work with buildpacks (the symptom is that they can't import code from the srs
submodule, I think same as last time). Can you please switch me back off buildpacks until the submodule issue is resolved?
The console does work a lot better with buildpacks now; nice!
@spendright-scrapers I'll switch off buildpacks for you right away. Could you possibly write some minimal test code that shows the problem you're seeing so I can reproduce it at my end?
@spendright-scrapers buildpack stuff switch off for you now
@spendright-scrapers sorry. don't worry about the minimal test code for the moment. I'll start by forking your scraper spendright-scrapers/companies
and check that I can reproduce the problem
@spendright-scrapers I can reproduce the problem locally so that's most of the battle over. Now to fix it!
Part of the issue here is that only certain files from your repo are injected into the scraper container during the "compile" phase where it parses requirements.txt
and runtime.txt
to install the libraries and the right version of python.
So, it doesn't at this stage have access to the submodule just your requirements.txt
file in the root of the directory.
This is what I tried.
First, installing srs
directly from git by putting this in requirements.txt
:
git+http://github.com/spendright-scrapers/srs.git@e3b09f6#egg=srs
This avoids the need for submodules.
But this ends up not installing the libraries in the requirements.txt
file inside srs
. My understanding of pip
is a bit limited so I'm not sure why that's happening.
So then just to see what would happen I added the contents of the srs
requirements file to the main requirements file to give:
git+http://github.com/spendright-scrapers/srs.git@e3b09f6#egg=srs
Unidecode==0.04.9
beautifulsoup4==4.1.3
dumptruck==0.1.6
html5lib==0.90 # used by beautifulsoup4
requests==1.0.4 # used by srs.vendor.reppy
This now gives the following error when you run the scraper:
Traceback (most recent call last):
File "scraper.py", line 26, in <module>
from srs.db import use_decimal_type_in_sqlite
File "/app/.heroku/python/lib/python2.7/site-packages/srs/db.py", line 12, in <module>
from .scrape import download
File "/app/.heroku/python/lib/python2.7/site-packages/srs/scrape.py", line 16, in <module>
from .vendor.reppy.cache import RobotsCache
ImportError: No module named vendor.reppy.cache
@spendright-scrapers @davidmarin What shall I try next?
I've discovered that another issue is that symbolic links are not getting injected into the container when it injects the source of the repo. This issue is in #562
Awesome, thanks, go for it!
I have my own poor-man's version of morph.io in production, so it's not the end of the world if my scrapers are broken on morph.io. I just don't have the spare cycles to poke at it and try to make it work right now.
I'm reopening until we know that this is definitely fixed
After you merged the PR I've switched buildpacks back on for you @spendright-scrapers @davidmarin. Hopefully the scraper for you now works with buildpacks. If you do have another issue don't hesitate to open another ticket here. Thanks!
https://github.com/spendright-scrapers/companies
Errno::ENOENT - No such file or directory - /var/www/releases/20140920073127/db/scrapers/repos/spendright-scrapers/companies/srs
Backtrace
View full backtrace and more info at honeybadger.io