Closed theoevans1 closed 3 years ago
Access being denied is common, usually, the servers might suspect you of being a bot and temporarily deny you from scraping. This issue is unfixable, maybe use a different header? Overall you can't really fix this.
"Access Denied" is the scraper's error message for private fics. The errors file will contain the work IDs for the Access Denied fics; if you navigate directly to that work (e.g. https://archiveofourown.org/works/ID), you can check if that is the issue (or a related access restriction). We cannot (and should not) scrape private fics.
@ssterman Hm, they don't seem to be private or otherwise restricted. It's generally been alternating between several in a row scraped successfully and several in a row Access Denied.
In that case @jack-debug may be correct; the scraper outputs "access denied" if there was an error or if it can't find the body text, which might happen if you're being blocked. Try increasing the delay between page accesses. You can also extract the failed IDs from the error file and retry only those in a separate batch.
Makes sense, I'll try running the failed IDs again. Thank you for your help!
Hi! I've tried a few times to run
python ao3_get_fanfics.py
, and it's successfully scraping around half of the stories but the rest are coming back "Access Denied." I tried adding this http header flag but it didn’t seem to help:--header 'Chrome/88.0.4324.146 (Macintosh; Intel Mac OS X 10.15.7); Theo Evans/University of Chicago/theoevans@uchicago.edu'
Any ideas of what might be going wrong?
Thank you!