CSS scraping no longer works with beta website

mpgreg commented 3 months ago

Funda has released new beta pages and the CSS needs to be updated. In the mean time the old URLs are available but need to be parsed back to the original with something like

from urllib.parse import urlparse, urlunparse

def fix_link(self, link:str) -> str:
        link_url = urlparse(link)
        link_path = link_url.path.split("/")
        property_id = link_path.pop(5)
        property_address =  link_path.pop(4).split("-")
        link_path = link_path[2:4]
        property_address.insert(1, property_id)
        link_path.extend(["-".join(property_address), "?old_ldp=true"])

        return urlunparse((link_url.scheme, link_url.netloc, "/".join(link_path),'','',''))

urls = [self.fix_link(url) for url in urls]

whchien commented 3 months ago

Hi @mpgreg thanks for the PR! I just merged it and published a new release (v1.2.0).

mpgreg commented 3 months ago

Thanks @whchien. Obviously this is just a workaround and the correct fix is to update the scraper logic. Hopefully Funda will keep the pre-beta pages available for a while.

whchien / funda-scraper

CSS scraping no longer works with beta website #40