whchien / funda-scraper

FundaScaper scrapes data from Funda, the Dutch housing website. You can find listings from house-buying or rental market, and historical data. 🏡
GNU General Public License v3.0
104 stars 48 forks source link

Scraper not retrieving any data #6

Closed BTuyn closed 1 year ago

BTuyn commented 1 year ago

image

light-bit commented 1 year ago

I noticed this as well. Unfortunately Funda changed the layout of their website, it seems that the scraper has to be updated in order to work with this new layout.

Good news is that the bs4 still returns the page content. Let's hope someone has the time to update the scripts accordingly.

PieterK123 commented 1 year ago

Same here. Would indeed be great if we can still get the script working.

PieterK123 commented 1 year ago

You managed to get it working @BTuyn?

dadadima commented 1 year ago

Same here!

whchien commented 1 year ago

Hi @BTuyn @light-bit @PieterK123 @dadadima94

I just updated the new version of funda_scraper. If you can install the latest version (v1.0.0), the scraping should work without problems now. Please let me know if you encounter anything unusual. All the feedback would be appreciated!

BTuyn commented 1 year ago

Sorry for the late response. Seems like the problem is fixed indeed, but after checking the update, I did notice some other bugs. The raw dataframe seems fine, however, the clean dataframe is missing quite a lot of rows and isn't processing data properly for a few columns such as 'has_balcony' and 'has_garden'. This is a different issue than the previous issue though and I see somebody else already raised a issue ticket for it.

whchien commented 1 year ago

Hi @BTuyn

Yes, you are correct. I removed the columns 'has_balcony' and 'has_garden' from the clean data frame. Funda changes how they describe these exteriors, so the original preprocessing script needs to be revised. I will include the new preprocessing logic in the next release.