ratt-ru / Stimela-classic

Containerized radio interferometry scripting framework -- NB: Classic version is no longer in active development, use stimela 2! See README for details.
GNU General Public License v2.0
28 stars 16 forks source link

Error reaching docker registry when pulling with singularity #635

Open gigjozsa opened 4 years ago

gigjozsa commented 4 years ago

I have witnessed this several times now, and I am wondering what the real reason is. No other software indicates connection errors or what this is:

stimela pull -s

...

# FATAL:   While making image from oci registry: while building SIF from layers: conveyor failed to get: Error initializing source oci:/home/jozsa/software/caracal_tests/.singularity/cache/cache/oci:38761e2aa2ee416f1757b788227f6722d949082ebaac55803f1c03e6e0c59b62: Error reading blob sha256:94dc70c1f4dd078cbb392b7d49eab67efd69e44ab8b20190ce3725f88dc33de4: Get https://registry-1.docker.io/v2/stimela/shadems/blobs/sha256:94dc70c1f4dd078cbb392b7d49eab67efd69e44ab8b20190ce3725f88dc33de4: dial tcp: lookup registry-1.docker.io: Temporary failure in name resolution
Traceback (most recent call last):
  File "/home/jozsa/software/caracal_tests/singularity/caracal_venv/bin/stimela", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/home/jozsa/software/caracal_tests/singularity/caracal_venv/src/stimela/bin/stimela", line 14, in <module>
    main.main([a for a in sys.argv[1:]])
  File "/home/jozsa/software/caracal_tests/singularity/caracal_venv/src/stimela/stimela/main.py", line 601, in main
    _cmd(argv)
  File "/home/jozsa/software/caracal_tests/singularity/caracal_venv/src/stimela/stimela/main.py", line 317, in pull
    singularity.pull(
  File "/home/jozsa/software/caracal_tests/singularity/caracal_venv/src/stimela/stimela/singularity.py", line 47, in pull
    utils.xrun(f"cd {directory} && singularity", ["pull", 
  File "/home/jozsa/software/caracal_tests/singularity/caracal_venv/src/stimela/stimela/utils/xrun_poll.py", line 189, in xrun
    raise StimelaCabRuntimeError("{} returns error code {}".format(command_name, status))
stimela.utils.StimelaCabRuntimeError: cd /home/jozsa/software/caracal_tests/stimela_pullfolder && singularity returns error code 255
stimela pull --singularity -f failed
stimela pull --singularity -f
# running cd /home/jozsa/software/caracal_tests/stimela_pullfolder && singularity pull --force --name stimela_tigger_1.2.0.sif docker://stimela/tigger:1.2.0
# FATAL:   While making image from oci registry: failed to get checksum for docker://stimela/tigger:1.2.0: pinging docker registry returned: Get https://registry-1.docker.io/v2/: dial tcp: lookup registry-1.docker.io: Temporary failure in name resolution
gigjozsa commented 4 years ago

I should add that the above runs until a few images are pulled, then crashes ungracefully in the way seen above.

o-smirnov commented 4 years ago

Well that's very clearly a networking error and outside of Stimela's control. Can you ping registry-1.docker.io directly after it crashes?

gigjozsa commented 4 years ago

Point is to get it more stable we might want to repeatedly attempt a download. I am doing this in caratekit because the servers are so unstable. Another question is if there is a choice for the remote server that can be made.

SpheMakh commented 4 years ago

@o-smirnov's idea of putting the images on an ftp server might be a possible solution. We could also catch these type of errors and attempt a few tries before raising an exception.