whchien / funda-scraper

FundaScaper scrapes data from Funda, the Dutch housing website. You can find listings from house-buying or rental market, and historical data. 🏡
GNU General Public License v3.0
104 stars 48 forks source link

Error '3 weken' on feature/funda_update branch #7

Closed ValenteMendez closed 1 year ago

ValenteMendez commented 1 year ago

ERROR: Traceback (most recent call last): File "pandas/_libs/tslib.pyx", line 616, in pandas._libs.tslib.array_to_datetime TypeError: invalid string coercion to datetime for "3 weken" at position 1 ... File "/opt/homebrew/lib/python3.11/site-packages/dateutil/parser/_parser.py", line 643, in parse raise ParserError("Unknown string format: %s", timestr) dateutil.parser._parser.ParserError: Unknown string format: 3 weken present at position 1

CODE TO FIX ERROR:

def clean_list_date(x: str) -> Union[datetime, str]:
    """Transform the date from string to datetime object."""

    def delta_now(d: int):
        t = timedelta(days=d)
        return datetime.now() - t

    weekdays_dict = {
        "maandag": "Monday",
        "dinsdag": "Tuesday",
        "woensdag": "Wednesday",
        "donderdag": "Thursday",
        "vrijdag": "Friday",
        "zaterdag": "Saturday",
        "zondag": "Sunday"
    }

    try:
        if x.lower() in weekdays_dict.keys():
            date_string = weekdays_dict.get(x.lower())
            parsed_date = parse(date_string, fuzzy=True)
            delta = datetime.now().weekday() - parsed_date.weekday()
            return delta_now(delta)

        elif (
                x.find("€") != -1
                or x.find("na") != -1
                or x.find("Indefinite duration") != -1
        ):
            return "na"
        elif x.find("month") != -1:
            return delta_now(int(x.split("month")[0].strip()[0]) * 30)
        elif "weken" in x:  # Handling "X weken" format
            return delta_now(int(x.split(" ")[0]) * 7)
        elif x.find("Today") != -1 or x.find("Vandaag") != -1:
            return delta_now(1)
        elif x.find("day") != -1:
            return delta_now(int(x.split("month")[0].strip()))
        else:
            return datetime.strptime(x, "%d %B %Y")

    except ValueError:
        return x

ADDITIONALLY

whchien commented 1 year ago

Hi @ValenteMendez

I just updated the new version of funda_scraper. If you can install the latest version (v1.0.0), the scraping should work without problems now. Please let me know if you encounter anything unusual. All the feedback would be appreciated!