jade-hpc-gpu / jade-hpc-gpu.github.io

Joint Academic Data Science Endeavour (JADE) is the largest GPU facility in the UK supporting world-leading research in machine learning (and this is the repo that powers its website)
http://www.jade.ac.uk/
Other
24 stars 8 forks source link

Name or service not known #184

Open kabbas570 opened 1 year ago

kabbas570 commented 1 year ago

Hello, Can you please help with this issue,

Thanks, Cheers, Abbas

CUDA-11.1 loaded

Python3 Pytorch for CUDA-10.2 is now loaded in your environment.

Downloading: "https://github.com/lukemelas/EfficientNet-PyTorch/releases/download/1.0/efficientnet-b2-8bb594d6.pth" to /jmain02/home/J2AD007/txk47/axr21-txk47/.cache/torch/hub/checkpoints/efficientnet-b2-8bb594d6.pth Traceback (most recent call last): File "/jmain02/home/J2AD007/txk47/axr21-txk47/.conda/envs/torch_env/lib/python3.11/urllib/request.py", line 1348, in do_open h.request(req.get_method(), req.selector, req.data, headers, File "/jmain02/home/J2AD007/txk47/axr21-txk47/.conda/envs/torch_env/lib/python3.11/http/client.py", line 1282, in request self._send_request(method, url, body, headers, encode_chunked) File "/jmain02/home/J2AD007/txk47/axr21-txk47/.conda/envs/torch_env/lib/python3.11/http/client.py", line 1328, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/jmain02/home/J2AD007/txk47/axr21-txk47/.conda/envs/torch_env/lib/python3.11/http/client.py", line 1277, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/jmain02/home/J2AD007/txk47/axr21-txk47/.conda/envs/torch_env/lib/python3.11/http/client.py", line 1037, in _send_output self.send(msg) File "/jmain02/home/J2AD007/txk47/axr21-txk47/.conda/envs/torch_env/lib/python3.11/http/client.py", line 975, in send self.connect() File "/jmain02/home/J2AD007/txk47/axr21-txk47/.conda/envs/torch_env/lib/python3.11/http/client.py", line 1447, in connect super().connect() File "/jmain02/home/J2AD007/txk47/axr21-txk47/.conda/envs/torch_env/lib/python3.11/http/client.py", line 941, in connect self.sock = self._create_connection( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/jmain02/home/J2AD007/txk47/axr21-txk47/.conda/envs/torch_env/lib/python3.11/socket.py", line 827, in create_connection for res in getaddrinfo(host, port, 0, SOCK_STREAM): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/jmain02/home/J2AD007/txk47/axr21-txk47/.conda/envs/torch_env/lib/python3.11/socket.py", line 962, in getaddrinfo for res in _socket.getaddrinfo(host, port, family, type, proto, flags): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ socket.gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/jmain02/home/J2AD007/txk47/axr21-txk47/keenAI/new_rust/train1.py", line 161, in Model = smp.DeepLabV3Plus( ^^^^^^^^^^^^^^^^^^ File "/jmain02/home/J2AD007/txk47/axr21-txk47/.conda/envs/torch_env/lib/python3.11/site-packages/segmentation_models_pytorch/decoders/deeplabv3/model.py", line 146, in init self.encoder = get_encoder( ^^^^^^^^^^^^ File "/jmain02/home/J2AD007/txk47/axr21-txk47/.conda/envs/torch_env/lib/python3.11/site-packages/segmentation_models_pytorch/encoders/init.py", line 85, in get_encoder encoder.load_state_dict(model_zoo.load_url(settings["url"])) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/jmain02/home/J2AD007/txk47/axr21-txk47/.conda/envs/torch_env/lib/python3.11/site-packages/torch/hub.py", line 746, in load_state_dict_from_url download_url_to_file(url, cached_file, hash_prefix, progress=progress) File "/jmain02/home/J2AD007/txk47/axr21-txk47/.conda/envs/torch_env/lib/python3.11/site-packages/torch/hub.py", line 611, in download_url_to_file u = urlopen(req) ^^^^^^^^^^^^ File "/jmain02/home/J2AD007/txk47/axr21-txk47/.conda/envs/torch_env/lib/python3.11/urllib/request.py", line 216, in urlopen return opener.open(url, data, timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/jmain02/home/J2AD007/txk47/axr21-txk47/.conda/envs/torch_env/lib/python3.11/urllib/request.py", line 519, in open response = self._open(req, data) ^^^^^^^^^^^^^^^^^^^^^ File "/jmain02/home/J2AD007/txk47/axr21-txk47/.conda/envs/torch_env/lib/python3.11/urllib/request.py", line 536, in _open result = self._call_chain(self.handle_open, protocol, protocol + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/jmain02/home/J2AD007/txk47/axr21-txk47/.conda/envs/torch_env/lib/python3.11/urllib/request.py", line 496, in _call_chain result = func(*args) ^^^^^^^^^^^ File "/jmain02/home/J2AD007/txk47/axr21-txk47/.conda/envs/torch_env/lib/python3.11/urllib/request.py", line 1391, in https_open return self.do_open(http.client.HTTPSConnection, req, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/jmain02/home/J2AD007/txk47/axr21-txk47/.conda/envs/torch_env/lib/python3.11/urllib/request.py", line 1351, in do_open raise URLError(err) urllib.error.URLError: <urlopen error [Errno -2] Name or service not known>

twinkarma commented 1 year ago

Hi Abbas, unfortunately it's not possible to connect to the internet from the compute node. You'll have to use the login node to install software and download any data you require before submitting your jobs.

kabbas570 commented 1 year ago

Dear @twinkarma Thanks for the suggestion, will try it. Cheers Abbas