larsbuntemeyer / notebooks

MIT License
0 stars 0 forks source link

ConnectionRefusedError in climdex-euro-cordex.ipynb #3

Closed jwohland closed 2 years ago

jwohland commented 2 years ago

The download of 'cordex.csv' from 'https://raw.githubusercontent.com/euro-cordex/tables/master/domains/cordex.csv' doesn't seem to work. It throws a sequence of ConnectionRefusedError and Connection Errors, ending in

ConnectionError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /euro-cordex/tables/master/domains/cordex.csv (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x2abf08b93880>: Failed to establish a new connection: [Errno 111] Connection refused'))

Seems surprising because I can wget the file easily on my local machine, so file does exist and is accessible. Is this a config / proxy problem? Or are we maybe missing another dependency?

larsbuntemeyer commented 2 years ago

if you run the notebook on shared partition at DKRZ you won't have internet connection (you only have on prepost or login nodes, also on eddy you have internet). so you might want to trigger the download maybe on the login node, so you get the tables cached in your home. you can trigger the download in the prompt, e.g.,

import cordex as cx
cx.domains.table

or any other command that requires grid information. Once, the tables are cached in your home, no internet connection is neccessary anymore.

jwohland commented 2 years ago

Good to know! With that, I can now execute everything easily.

For future newbies to the DKRZ ecosystem, we could (a) mention in the README that they have to trigger these downloads on a login node or (b) tell them to use prepost instead of shared (which is what I do now)

Is there any downside of (b)? If not, this seems to be the most simplest of doing it. prepost also seems to be as powerful and available in similar numbers as shared according to this.

larsbuntemeyer commented 2 years ago

From my experience, the downside of (b) is that you might have to wait long for a prepost node to get allocated since they are usually quite busy (see sinfo and check idle). The network problem is just a general problem (you also have it with xarray or regionmask), basically all packages that use external resources. I like to make sure my codes run on shared nodes so i don't require excessive memory and also can run it in production in SLURM on shared.

However, I'll updated the troubleshooting in the README.

larsbuntemeyer commented 2 years ago

Anyway, thanks a lot for trying this and the feedback, that's really valuable!