juliomalegria / python-craigslist

Simple Craigslist wrapper
MIT No Attribution
389 stars 116 forks source link

Wrapper Still Maintained? #116

Open tangunner opened 2 years ago

tangunner commented 2 years ago

Hi - I was just wondering if this wrapper is still maintained? I tried a few different endpoints but received a couple different types of errors and wasn't sure if they're caused by a bad local installation on my end or if the wrapper was just not longer supported and there had since been updates to CL. Thanks!

irahorecka commented 2 years ago

Do you have a reproducible snippet of code that threw the error(s)?

jsudano commented 2 years ago

Getting similar issues here. This is the error I see whenever I try to search:

Traceback (most recent call last):
  File "cl_scraper.py", line 39, in <module>
    found_posts.update({ result['id'] : result for result in CL_query.get_results() })
  File "cl_scraper.py", line 39, in <dictcomp>
    found_posts.update({ result['id'] : result for result in CL_query.get_results() })
  File "/home/jsudano/projects/cl_scraper/venv/lib/python3.7/site-packages/craigslist/base.py", line 192, in get_results
    for row in rows.find_all('li', {'class': 'result-row'},
AttributeError: 'NoneType' object has no attribute 'find_all'

Was able to reproduce it quite simply:

Python 3.7.3 (default, Jul 25 2020, 13:03:44)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from craigslist import CraigslistForSale
>>> CL_query = CraigslistForSale(site='sfbay', category='mca')
>>> for e in CL_query.get_results():
...     print(e)
...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jsudano/projects/cl_scraper/venv/lib/python3.7/site-packages/craigslist/base.py", line 191, in get_results
    for row in rows.find_all('li', {'class': 'result-row'},
AttributeError: 'NoneType' object has no attribute 'find_all'
>>>

I dug a little deeper and looked at the HTML response from craigslist, it looks like CL won't respond to requests unless you have a "browser with javascript enabled":

<noscript id="no-js"><div>
<p>We've detected that JavaScript is not enabled in your browser.</p>
<p>You must enable JavaScript to use craigslist.</p>
</div></noscript>
<div id="unsupported-browser">
<p>We've detected you are using a browser that is missing critical features.</p>
<p>Please visit craigslist from a modern browser.</p>
</div>

I'm guessing this would break the library in most cases.

pancho-villa commented 2 years ago

I'm having the same issue. I just downloaded this yesterday and it worked fine for a bunch of queries. Today I fired it up again and get this every time.

natez311 commented 2 years ago

Hi,

Also having the same issue

Traceback (most recent call last): File "forSale.py", line 1, in <module> from craigslist import CraigslistJobs, CraigslistForSale File "/Users/nathanielhurwitz/.pyenv/versions/3.8.10/lib/python3.8/site-packages/craigslist/__init__.py", line 1, in <module> from .craigslist import ( File "/Users/nathanielhurwitz/.pyenv/versions/3.8.10/lib/python3.8/site-packages/craigslist/craigslist.py", line 1, in <module> from .base import CraigslistBase File "/Users/nathanielhurwitz/.pyenv/versions/3.8.10/lib/python3.8/site-packages/craigslist/base.py", line 17, in <module> ALL_SITES = utils.get_all_sites() # All the Craiglist sites File "/Users/nathanielhurwitz/.pyenv/versions/3.8.10/lib/python3.8/site-packages/craigslist/utils.py", line 40, in get_all_sites response = requests.get(ALL_SITES_URL) File "/Users/nathanielhurwitz/.pyenv/versions/3.8.10/lib/python3.8/site-packages/requests/api.py", line 75, in get return request('get', url, params=params, **kwargs) File "/Users/nathanielhurwitz/.pyenv/versions/3.8.10/lib/python3.8/site-packages/requests/api.py", line 61, in request return session.request(method=method, url=url, **kwargs) File "/Users/nathanielhurwitz/.pyenv/versions/3.8.10/lib/python3.8/site-packages/requests/sessions.py", line 529, in request resp = self.send(prep, **send_kwargs) File "/Users/nathanielhurwitz/.pyenv/versions/3.8.10/lib/python3.8/site-packages/requests/sessions.py", line 645, in send r = adapter.send(request, **kwargs) File "/Users/nathanielhurwitz/.pyenv/versions/3.8.10/lib/python3.8/site-packages/requests/adapters.py", line 519, in send raise ConnectionError(e, request=request) requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.craigslist.org', port=80): Max retries exceeded with url: /about/sites (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x10b0cf430>: Failed to establish a new connection: [Errno 60] Operation timed out'))

moozzyk commented 2 years ago

I came across this yesterday and it doesn't seem like the wrapper can work anymore. I used a headless browser to work around this. See https://blog.3d-logic.com/2022/10/02/craigslist-automation for more details and a prototype.

brandomando commented 1 year ago

+1

f3mshep commented 1 year ago

@irahorecka @juliomalegria

I put together a fork of this project that uses selenium + a chrome webdriver that seems to bypass the craigslist bot detection.I also had to change the css classes in the scraping bit to get things working. If there is interest I can clean up the code a bit and submit a formal PR?

pancho-villa commented 1 year ago

Sounds good to me, I doubt you'd even have to submit a pull request, I'd just follow your fork.

f3mshep commented 1 year ago

Fork is here for anyone curious: https://github.com/f3mshep/python-craigslist-headless

genialtechie commented 1 year ago

Fork is here for anyone curious: https://github.com/f3mshep/python-craigslist-headless

@f3mshep Do you know if this fork still works? I am a bit new to python, and I am trying to use it for a little project but I keep getting the same NoneType error referenced in the above issues. A screenshot of my code is below as well, any help would be useful.

... AttributeError: 'NoneType' object has no attribute 'find_all'

Screenshot