Open tangunner opened 2 years ago
Do you have a reproducible snippet of code that threw the error(s)?
Getting similar issues here. This is the error I see whenever I try to search:
Traceback (most recent call last):
File "cl_scraper.py", line 39, in <module>
found_posts.update({ result['id'] : result for result in CL_query.get_results() })
File "cl_scraper.py", line 39, in <dictcomp>
found_posts.update({ result['id'] : result for result in CL_query.get_results() })
File "/home/jsudano/projects/cl_scraper/venv/lib/python3.7/site-packages/craigslist/base.py", line 192, in get_results
for row in rows.find_all('li', {'class': 'result-row'},
AttributeError: 'NoneType' object has no attribute 'find_all'
Was able to reproduce it quite simply:
Python 3.7.3 (default, Jul 25 2020, 13:03:44)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from craigslist import CraigslistForSale
>>> CL_query = CraigslistForSale(site='sfbay', category='mca')
>>> for e in CL_query.get_results():
... print(e)
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jsudano/projects/cl_scraper/venv/lib/python3.7/site-packages/craigslist/base.py", line 191, in get_results
for row in rows.find_all('li', {'class': 'result-row'},
AttributeError: 'NoneType' object has no attribute 'find_all'
>>>
I dug a little deeper and looked at the HTML response from craigslist, it looks like CL won't respond to requests unless you have a "browser with javascript enabled":
<noscript id="no-js"><div>
<p>We've detected that JavaScript is not enabled in your browser.</p>
<p>You must enable JavaScript to use craigslist.</p>
</div></noscript>
<div id="unsupported-browser">
<p>We've detected you are using a browser that is missing critical features.</p>
<p>Please visit craigslist from a modern browser.</p>
</div>
I'm guessing this would break the library in most cases.
I'm having the same issue. I just downloaded this yesterday and it worked fine for a bunch of queries. Today I fired it up again and get this every time.
Hi,
Also having the same issue
Traceback (most recent call last): File "forSale.py", line 1, in <module> from craigslist import CraigslistJobs, CraigslistForSale File "/Users/nathanielhurwitz/.pyenv/versions/3.8.10/lib/python3.8/site-packages/craigslist/__init__.py", line 1, in <module> from .craigslist import ( File "/Users/nathanielhurwitz/.pyenv/versions/3.8.10/lib/python3.8/site-packages/craigslist/craigslist.py", line 1, in <module> from .base import CraigslistBase File "/Users/nathanielhurwitz/.pyenv/versions/3.8.10/lib/python3.8/site-packages/craigslist/base.py", line 17, in <module> ALL_SITES = utils.get_all_sites() # All the Craiglist sites File "/Users/nathanielhurwitz/.pyenv/versions/3.8.10/lib/python3.8/site-packages/craigslist/utils.py", line 40, in get_all_sites response = requests.get(ALL_SITES_URL) File "/Users/nathanielhurwitz/.pyenv/versions/3.8.10/lib/python3.8/site-packages/requests/api.py", line 75, in get return request('get', url, params=params, **kwargs) File "/Users/nathanielhurwitz/.pyenv/versions/3.8.10/lib/python3.8/site-packages/requests/api.py", line 61, in request return session.request(method=method, url=url, **kwargs) File "/Users/nathanielhurwitz/.pyenv/versions/3.8.10/lib/python3.8/site-packages/requests/sessions.py", line 529, in request resp = self.send(prep, **send_kwargs) File "/Users/nathanielhurwitz/.pyenv/versions/3.8.10/lib/python3.8/site-packages/requests/sessions.py", line 645, in send r = adapter.send(request, **kwargs) File "/Users/nathanielhurwitz/.pyenv/versions/3.8.10/lib/python3.8/site-packages/requests/adapters.py", line 519, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.craigslist.org', port=80): Max retries exceeded with url: /about/sites (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x10b0cf430>: Failed to establish a new connection: [Errno 60] Operation timed out'))
I came across this yesterday and it doesn't seem like the wrapper can work anymore. I used a headless browser to work around this. See https://blog.3d-logic.com/2022/10/02/craigslist-automation for more details and a prototype.
+1
@irahorecka @juliomalegria
I put together a fork of this project that uses selenium + a chrome webdriver that seems to bypass the craigslist bot detection.I also had to change the css classes in the scraping bit to get things working. If there is interest I can clean up the code a bit and submit a formal PR?
Sounds good to me, I doubt you'd even have to submit a pull request, I'd just follow your fork.
Fork is here for anyone curious: https://github.com/f3mshep/python-craigslist-headless
Fork is here for anyone curious: https://github.com/f3mshep/python-craigslist-headless
@f3mshep Do you know if this fork still works? I am a bit new to python, and I am trying to use it for a little project but I keep getting the same NoneType error referenced in the above issues. A screenshot of my code is below as well, any help would be useful.
...
AttributeError: 'NoneType' object has no attribute 'find_all'
Hi - I was just wondering if this wrapper is still maintained? I tried a few different endpoints but received a couple different types of errors and wasn't sure if they're caused by a bad local installation on my end or if the wrapper was just not longer supported and there had since been updates to CL. Thanks!