prashantghimire / sofifa-web-scraper

It has over 18k detailed players info and stats from EA FC 24 scrapped from SoFIFA.com.
31 stars 4 forks source link

How did you collect all this data from sofifa? #1

Closed matheusmazeto closed 1 year ago

matheusmazeto commented 1 year ago

How did you collect all this data from sofifa?

I tried to do a web scraping with python, my I'm bumping into a cloudfare security error.

Can you teach me(any repo) or share with me How did you do it?

prashantghimire commented 1 year ago

@matheusmazeto I believe I ran into similar error. I had to set a second or (few hundred ms) delay between scraping calls. I will release the source code at some point for this. Need to do some cleanup before that.

matheusmazeto commented 1 year ago

Thanks for that, share the source code as soon as possible, will be so helpful. Nice job Prashant!

prashantghimire commented 1 year ago

Updated with the source code. You're welcome.

matheusmazeto commented 1 year ago

Thanks a lot Prashant, this will help me a lot!

What fix the cloudflare error is that is necessary to send in the request the headers, like you did, I read that on some post in stackoverflow website.

You did a great job, thanks for uploading the data on Kaggle. This will help many people.

prashantghimire commented 1 year ago

I did not do anything crazy for cloudflare issue. Just added a delay of 300ms and user agent header. You can see that here.

matheusmazeto commented 1 year ago

Before I found your repository I was making a script in python to scrape data from Sofifa, in the first version it wasn't sending the headers in the API call and it got me to the cloudfare error I mentioned, but when I send the headers, the error cloudfare disappears, so I think that in order to not have a cloudfare error, just send the headers in the API call, so as not to have a Cloudfare error.

Anyway, thanks a lot for sharing the source code. It will help a lot!