irahorecka / chrono24

Chrono24 API wrapper
MIT License
24 stars 1 forks source link

Breaking API change? #5

Open fdiotalevi opened 3 weeks ago

fdiotalevi commented 3 weeks ago

I have been using the library for a few months without issues, but since a week ago I can't anymore. Even the test script in the README will not work

How to reproduce

  1. Use the test script in the README
import chrono24

for listing in chrono24.query("Rolex DateJust").search():
    print(listing) 
  1. Run it and obtain
> python3 test.py  
Retrying request... Attempt #1.
Retrying request... Attempt #2.
Retrying request... Attempt #3.
Retrying request... Attempt #4.
irahorecka commented 3 weeks ago

Thanks for bringing this up. It seems Chrono24 has implemented Cloudflare’s anti-scraping feature to block non-browser requests. If the issue persists, I can explore using Selenium as a workaround. While it could be effective, it may be more cumbersome to use.

fdiotalevi commented 3 weeks ago

Makes sense. Is there a workaround to be able to use the library?

irahorecka commented 3 weeks ago

Not at the moment, unfortunately. If you are able to get a hold of the HTML content, you should be able to fetch listings using this private method

import chrono24

# listing_html is your beautifulsoup4 object
standard_listing_dict = chrono24.query._get_standard_listing_as_json(listing_html)
detailed_listing_dict = chrono24.query._get_detailed_listing_as_json(listing_html)
irahorecka commented 1 week ago

Cloudflare Issues – Temporary Workaround

Thanks to @davidiola for suggesting a potential solution to the Cloudflare problems.

Use FlareSolverr, an open-source proxy. Spin up the Docker container as described in their documentation, route your requests through it, and retrieve the relevant HTML.

I haven’t tested this yet. It may work as a stopgap. A more permanent fix is under consideration.

If anyone tries it, post your feedback here.