Scraping job data - Githubissues

mendableai / firecrawl

🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

https://firecrawl.dev

GNU Affero General Public License v3.0

14.25k stars 1.03k forks source link

Scraping job data #602

Open Thaslim42 opened 2 weeks ago

Thaslim42 commented 2 weeks ago

while i used firecrawl to scrape data from a job site it only scraped data from the initial page. but the actual data is present inside the job title link i wanted to extract that data too how can i achievev it? ...here is a sample screenshot of the page info

nickscamara commented 2 weeks ago

Hey @Thaslim42 could you try running with allowBackwardLinks option? This enables the crawler to navigate from a specific URL to previously linked pages or pages that are not children of the one that you started the crawl.

Thaslim42 commented 2 weeks ago

it still dont worked..any other options?

nickscamara commented 2 weeks ago

@Thaslim42

ccing @tomkosm

tomkosm commented 1 week ago

@Thaslim42 are you using scrape or crawl? You should use crawl for this, please share all of the options you are using, the url and the result you are getting. Also are you running self host or using the api?

Thaslim42 commented 5 days ago

i used firecrawl playground to crawl this url and it only scraped links of unwanted contents like newsletter gallary etc..but the content inside the job title link didnt scarped...here is a ss of params i provided Screenshot 2024-09-13 120912