scrapfly / scrapfly-scrapers

Web scrapers for popular targets powered Scrapfly.io
https://scrapfly.io
Other
169 stars 46 forks source link

The result of the searches does not match that of the downloaded records. #1

Closed MatteoSid closed 1 year ago

MatteoSid commented 1 year ago

If I manually search for python in texas I found 5772 job offers but the script downloads only 2.

image

image

Granitosaurus commented 1 year ago

Hey @MatteoSid I couldn't replicate this issue which leads me to believe that you might be misunderstanding the run.py example file.

If you take a look at the code:

    url = "https://www.indeed.com/jobs?q=python&l=Texas"
    result_search = await indeed.scrape_search(url, max_results=100)
    output.joinpath("search.json").write_text(json.dumps(result_search, indent=2, ensure_ascii=False))

    jobs = ["4c1e2988b22fa223", "483d39cbe1b6c1fe"]
    result_jobs = await indeed.scrape_jobs(jobs)
    output.joinpath("jobs.json").write_text(json.dumps(result_jobs, indent=2, ensure_ascii=False))

This example scrapes search for the first 100 job listing preview results (see /results/search.json for example output) and then scrapes 2 invidual jobs for full job listing details (see /results/jobs.json for example output).

You can modify this to your needs to scrape more search results and turn job previews to full job datasets (see the jobKey field in the job preview item). Hopefully that solves your issue!