grantwilliams / wg-gesucht-crawler-cli

Python web crawler / scraper for WG-Gesucht. Crawls the WG-Gesucht site for new apartment listings and send a message to the poster, based off your saved filters and saved text
MIT License
77 stars 28 forks source link

Crawler error #10

Open moar55 opened 4 years ago

moar55 commented 4 years ago

Hello there, I really like the idea of this cli-tool. However, i am getting this error when attempting to use it:

Running until canceled, check info.log for details...
Traceback (most recent call last):
  File "/usr/local/bin/wg-gesucht-crawler-cli", line 11, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/wg_gesucht/cli.py", line 84, in cli
    wg_gesucht.search()
  File "/usr/local/lib/python3.6/dist-packages/wg_gesucht/crawler.py", line 383, in search
    self.email_apartment(ad_url, template_text)
  File "/usr/local/lib/python3.6/dist-packages/wg_gesucht/crawler.py", line 316, in email_apartment
    ad_info = self.get_info_from_ad(url)
  File "/usr/local/lib/python3.6/dist-packages/wg_gesucht/crawler.py", line 267, in get_info_from_ad
    online_status = ad_submitter.find('span')
AttributeError: 'NoneType' object has no attribute 'find'
Stopped running!
moar55 commented 4 years ago

I am guessing this happens because the site's html has changed

Pipazoul commented 4 years ago

Dirty fix Edit the crawler.py in /usr/local/lib/python3.6/dist-packages/wg_gesucht/crawler.py On line 266 replace text-capitalise with panel-body On line 267 replace ad_submitter.find('span') to ad_submitter.find('div', {'class': 'col-md-6','class':'text-right'})

On line 320 replace btn-orange with wgg_orange

sechsneun commented 4 years ago

I'm getting the same error, just with a different tag that's not found

"/Users/danijel/anaconda3/lib/python3.7/site-packages/wg_gesucht/crawler.py", line 240, in process_filter_results post_date_link = result.find("td", {"class": "ang_spalte_datum"}).find("a")

@Pipazoul how did you proceed in finding out which class it is, that was replaced and what the new tag is?

Pipazoul commented 4 years ago

I've retested it, it still works maybe just try to replace the file crawler.py in your pip library path with this one https://github.com/Pipazoul/wg-gesucht-crawler-cli/blob/master/wg_gesucht/crawler.py

To find the new classes i've searched the nearest class available in the html off a wg-gesucht ad The crawler searches in the panel panel-rhs-default rhs_contact_information hidden-sm class to get the posted date And gets the url to send the message from the orange button btn btn-block btn-md wgg_orange

grantwilliams commented 4 years ago

@moar55 Yeh they unfortunately change their site a lot, I've updated it recently, but it looks like you still have the older version, if you update with pip install --upgrade wg-gesucht-crawler-cli it should work

@sechsneun Do you have any Gesucht/Request filters saved on your profile? If you do the script will try search them and ang_spalte_datum tags won't be on the page (will be ges_spalte_datum). try running the script with wg-gesucht-crawler-cli --filter-names="Name you gave the filter you saved"

moar55 commented 4 years ago

@grantwilliams Understandable. Thank you for the tool neverthless :)