nachocho / pyrae

Perform searches against the RAE (Real Academia Española) dictionary.
Other
15 stars 3 forks source link

ERROR - dle.search_by_url - The server could not fulfill the request. Error code: 403. #12

Open santosadrian opened 2 years ago

santosadrian commented 2 years ago

Python 3.6.9 (default, Mar 15 2022, 13:55:28) [GCC 8.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.

from pyrae import dle res = dle.search_by_word(word='catarsis') 2022-06-30 22:28:36,733 - INFO - dle.search_by_url - Performing request to: 'https://dle.rae.es/catarsis'... 2022-06-30 22:28:36,855 - ERROR - dle.search_by_url - The server could not fulfill the request. Error code: 403. res = dle.search_by_word(word='Catarsis') 2022-06-30 22:29:17,922 - INFO - dle.search_by_url - Performing request to: 'https://dle.rae.es/Catarsis'... 2022-06-30 22:29:17,983 - ERROR - dle.search_by_url - The server could not fulfill the request. Error code: 403. res = dle.search_by_word(word='hola') 2022-06-30 22:29:26,126 - INFO - dle.search_by_url - Performing request to: 'https://dle.rae.es/hola'... 2022-06-30 22:29:26,191 - ERROR - dle.search_by_url - The server could not fulfill the request. Error code: 403.

pabsi commented 2 years ago

I suspect the issue has to do with the user-agent and other request headers: https://github.com/nachocho/pyrae/blob/main/pyrae/dle.py#L29

CloudFlare also blocks plain curl requests to a dle endpoint


Supongo que tiene que ver con el user-agent y otros headers en la petición HTTP: https://github.com/nachocho/pyrae/blob/main/pyrae/dle.py#L29

CloudFlare también bloquea la petición cuando se hace via curl

nachocho commented 2 years ago

Thanks for reporting the issue. I am not getting any error running the sample code. I am getting proper HTTP responses and the search_by_word function is properly returning a SearchResult instance that contains the word's information.

Do you have more information as of how to reproduce the problem consistently?

pabsi commented 2 years ago

This is all I have I am afraid:

$ python3
Python 3.6.9 (default, Mar 15 2022, 13:55:28) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from pyrae import dle
>>> res = dle.search_by_word(word='hola')
2022-07-12 16:55:32,257 - INFO    - dle.search_by_url - Performing request to: 'https://dle.rae.es/hola'...
2022-07-12 16:55:32,367 - ERROR   - dle.search_by_url - The server could not fulfill the request. Error code: 403.
>>> 

same as OP (@santosadrian ).

You could perhaps try from different networks? or different machines?

I suggested CloudFlare blocking the request as the user-agent is hardcoded and quite simple, but that may not solve the problem either.

If you try a curl request:

$ curl -v https://dle.rae.es/hola

You get an HTTP error code "403 Forbidden" too (well, I do):

< HTTP/2 403 
< date: Tue, 12 Jul 2022 14:57:23 GMT
< content-type: text/html; charset=UTF-8
< cf-chl-bypass: 1
< permissions-policy: accelerometer=(),autoplay=(),camera=(),clipboard-read=(),clipboard-write=(),fullscreen=(),geolocation=(),gyroscope=(),hid=(),interest-cohort=(),magnetometer=(),microphone=(),payment=(),publickey-credentials-get=(),screen-wake-lock=(),serial=(),sync-xhr=(),usb=()
< cache-control: private, max-age=0, no-store, no-cache, must-revalidate, post-check=0, pre-check=0
< expires: Thu, 01 Jan 1970 00:00:01 GMT
< x-frame-options: SAMEORIGIN
< expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
< report-to: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=KQYT77qeQleRfZ5O%2BK9No3WFZ%2BzpiBYrxckHVdjyVKbuMcydX1kh%2B57bcOw7zONw6ZKEgsyamkOqOML%2Bo0mHLtr4mGFlq1KXFtYsQIvAC9HEtyuDqqGFpaz%2B0jw%3D"}],"group":"cf-nel","max_age":604800}
< nel: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
< server: cloudflare
< cf-ray: 729aa18b0a1f86c3-MAD
< alt-svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400

I am not sure what else I can provide to help troubleshoot the issue.

Thanks