rom1504 / clip-retrieval

Easily compute clip embeddings and build a clip retrieval system with them
https://rom1504.github.io/clip-retrieval/
MIT License
2.42k stars 213 forks source link

Default context to fix bug with some requests returning 404 #246

Closed Padge91 closed 1 year ago

Padge91 commented 1 year ago

This change is to fix an issue with HTTP requests returning 404 for some URLs, despite the URL functioning as expected in browsers, curl, wget, etc. Only affects Python versions earlier than 3.10.

Bug Description: When using a Python version earlier than 3.10, some HTTP requests return 404 through the urllib.request.urlopen function. This seems to be happening most often with Wordpress CDN urls. Some examples:

https://i0.wp.com/live.staticflickr.com/65535/51671200033_559f07d991_b.jpg?resize=450%2C300
https://i0.wp.com/live.staticflickr.com/65535/51882032946_514849a9c0_b.jpg?resize=450%2C300
https://i0.wp.com/live.staticflickr.com/65535/52563715365_4c0d7bd19b_b.jpg?resize=450%2C300

These urls work as expected in the browser, but fail with the aforementioned function.

Fix: Adding a default context and setting the ALPN protocol seems to resolve the issue. The change should be pretty safe as it's the same change in Python 3.10+.

Discussion: https://bugs.python.org/issue40968 Source: https://github.com/python/cpython/commit/f97406be4c0a02c1501c7ab8bc8ef3850eddb962

Padge91 commented 1 year ago

@rom1504 Any interest in this bug fix?

rom1504 commented 1 year ago

thanks