stashapp / CommunityScrapers

This is a public repository containing scrapers created by the Stash Community.
https://stashapp.github.io/CommunityScrapers/
GNU Affero General Public License v3.0
657 stars 422 forks source link

Imdb scraper not working: "http error 403:Forbidden" #1353

Closed ilgiaco closed 9 months ago

ilgiaco commented 1 year ago

Imdb scraper not working. I get "http error 403:Forbidden" whenever I do both query and url scraping.

Stash version: v0.20.2 CommunityScrapers version: latest as of today (047fe0c)

Other scraper are working well. Can someone confirm the issue?

ilgiaco commented 1 year ago

Ok, it seems they are filtering by User-Agent, so just setting it to a different value seems to work for me. At the end of IMDB.yml file I put this config:

driver:
  headers:
    - Key: User-Agent
      Value: PostmanRuntime/7.32.3
swoop124 commented 1 year ago

hello,

i just tried the IMDB scraper and had the same error as @ilgiaco. as you mentioned i added your code into IMDB.xml

but now i get this error: scrapeSinglePerformer[0]: input: scrapeSinglePerformer[0] must not be null

any ideas?

ilgiaco commented 1 year ago

Hi @swoop124 I don't know why you got this error. Here you can find my entire IMDB.yml file (maybe just your typo, but it's not an xml file). You have to unzip because github doesn't let me upload yml files. If you get the same problem you may post the link you're trying to scrape from. IMDB.zip

PARKYUNSU commented 7 months ago

Hi @swoop124 Here is the solution.

import requests
headers = {"User-Agent": "PostmanRuntime/7.32.3"}
driver = {"headers": [headers]}
res = requests.get('https://www.imdb.com/chart/top/', headers=headers)
print(res.status_code)