ipeaGIT / geobr

Easy access to official spatial data sets of Brazil in R and Python
https://ipeagit.github.io/geobr/
786 stars 118 forks source link

Max Retries Exceeded Error #312

Closed bennettcapozzi closed 9 months ago

bennettcapozzi commented 1 year ago

It seems like I do not have permission to connect to the IPEA server to access the data.

I'm using the Python package.

Getting the following error:

ConnectionError: HTTPConnectionPool(host='www.ipea.gov.br', port=80): Max retries exceeded with url: /geobr/metadata/metadata_1.7.0_gpkg.csv (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f79604a1f70>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))

Any explanation for why I would be getting this error and how I can access? Thanks!

rafapereirabr commented 1 year ago

Hi @bennettcapozzi . Thanks for opening this issue. This problem occurred because our servers were offline yesterday for planned maintenance. However, this ERROR should not have happened because geobr should have redirected to link to our data stored on GitHub. Perhaps there was an issue in the redirection @JoaoCarabetta ?

vss-2 commented 1 year ago

Hello guys, I had the same problem during a maintenance at past thursday night. The IPEA portal was unstable even on web browser. After finding this issue, I simulated the same connection behaviour on Linux by blocking IPEA address using iptables: sudo iptables -I INPUT -s www.ipea.gov.br -j DROP

Translating: -I: inserts rule at 'INPUT', -s: source address, -j target: DROP (all conections).

The mirror is not reached because once request.get(url) raises a NewConnectionError the for loop is interrupted without making a request to the mirror URL (the second on urls list). https://github.com/ipeaGIT/geobr/blob/c9787a742b099373c10939173995a974e187f62c/python-package/geobr/utils.py#L26-L31

In my pull request #316 , I suggest surrounding the request.get(url) with an exception handling to prevent early interruption of the for loop.

If anyone wants to test on Linux: To remove the blocking address rule, check rule number with sudo iptables -L INPUT.

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
DROP       all  --  45.171.102.5         anywhere
...

If console returns similar output, then run: sudo iptables -D INPUT 1

TAKE CARE: in my environment, after running sudo iptables -I INPUT -s www.ipea.gov.br -j DROP, the rule was added as number 1. Please verify before deleting the INPUT rule number 1 in your specific environment!

rafapereirabr commented 1 year ago

@JoaoCarabetta , would you mind having a look at this please?

phbonamin commented 1 year ago

I hate to be that guy, but restarting my PC solved this issue. I tried restarting my internet but it did not work.

rafapereirabr commented 1 year ago

Perhaps simply restarting the R session would suffice ? @bennettcapozzi and @vss-2 , would you mind trying ?

vss-2 commented 1 year ago

Hello, in my environment this error occurred only for some hours back in that day. I've been using mostly geobr's Python package in the past weeks and the file-server seems to be working fast and stable.

Although I would still suggest surrounding the request.get() with try/catch to make sure GitHub's file mirror is reached (if the file-server becomes unstable again).

rafapereirabr commented 9 months ago

I guess @vss-2 has solved this issue as well, right? Are we good to close this issue now?

rafapereirabr commented 9 months ago

Hi @vss-2, I believe you have fixed this issue already, right ?

vss-2 commented 9 months ago

Yes it has been tested and fixed! Sorry to take this long to answer, @rafapereirabr .

rafapereirabr commented 9 months ago

No worries. Thanks so much for your contributions, @vss-2 . Closing this issue for now.