Open SpheMakh opened 4 years ago
This is the ID of the data I'm trying to download 1584577476
@ludwigschwardt any ideas?
I could not recreate the error. The server typically slammed down the phone on their side with such an error message, indicating temporary overload which should go away if you try again later.
I'm still a bit sad since my latest improvements aim to catch these errors and turn them into flagged missing data without crashing the script. I'm keeping this open to remind me to catch this error too.
I've tried a couple of times already, and it fails at the same point each time; at around 500Gb. But I'll give it another go
Interesting... I tried the following:
import katdal
from katdal.lazy_indexer import DaskLazyIndexer
d = katdal.open('...')
d.select(scans=5) # which is where you are getting stuck
v, w, f = DaskLazyIndexer.get([d.vis, d.weights, d.flags], 0)
for n in range(602):
print(n)
DaskLazyIndexer.get([d.vis, d.weights, d.flags], n, out=[v, w, f])
It made it all the way to the end... Maybe try this on your setup.
Update your katdal Sphe. If you are coming in from Rhodes you need the latest and greatest! Had run into this network endpoint problem before
On Wed, Apr 8, 2020 at 2:10 PM Sphesihle Makhathini < notifications@github.com> wrote:
I've tried a couple of times already, and it fails at the same point each time; at around 500Gb. But I'll give it another go
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ska-sa/katdal/issues/296#issuecomment-610921381, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4RE6QKHBFZMEYHQ5DESTLRLRSU7ANCNFSM4LUJL7YQ .
Benjamin Hugo
PhD. student, Centre for Radio Astronomy Techniques and Technologies Department of Physics and Electronics Rhodes University
Junior software developer Radio Astronomy Research Group South African Radio Astronomy Observatory Black River Business Park Observatory Cape Town
katdal 0.15 is pretty new (just pre-lockdown).
This only happens when I'm running in a docker container. It works fine outside a container. @ludwigschwardt are there any containers that use mvftoms that you know of, maybe I made a mistake in building mine.
I have not had this issue running on my docker container that has mvftoms.py . I should also point out that I run it on one of the comm machines and this would mitigate any bad network problems.
@spassmoor I'm running on com08. Can you share the Dockerfile?
Thanks, I'll give it go.
Just remember that this is a public thread, in case your zip contains sensitive info :-)
Other than my work email address and my preferred version of tornado I don't think there is anything sensitive in it.
Hey folks, I encountered a similar issue using katdal 0.17. mvftoms.py failed for me with a connection time out error on my dataset (1596945366). I tried running @ludwigschwardt 's script above and that failed with the same time out error. Any ideas how to solve this issue?
Need more info here.
Is this inside a container? If so please check that your docker bridge is working properly by pinging or telnet inside your container.
If it is working can you indicate whether it fails part way through download or right at the start. There might be a misconfigured end point or something else not related to containerization.
On Thu, 25 Feb 2021, 20:42 Sarrvesh, notifications@github.com wrote:
Hey folks, I encountered the same issue using katdal 0.17. mvftoms.py failed for me with a connection time out error on my dataset (1596945366). I tried running @ludwigschwardt https://github.com/ludwigschwardt 's script above and that failed with the same time out error. Any ideas how to solve this issue?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ska-sa/katdal/issues/296#issuecomment-786118295, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4RE6XIUOWAQC2JKPEI6F3TA2KY3ANCNFSM4LUJL7YQ .
I get the same error in a container environment and in a normal virtualenv installation. It seems to fail right away with the following error:
StoreUnavailable: Chunk '1596945366-sdp-l0/correlator_data/00168_00000_00000': HTTPConnectionPool(host='archive-gw-1.kat.ac.za', port=7480): Max retries exceeded with url: /1596945366-sdp-l0/correlator_data/00168_00000_00000.npy (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f7263391860>, 'Connection to archive-gw-1.kat.ac.za timed out. (connect timeout=30)'))
My network, otherwise, works just fine.
Hi @sarrvesh, a connection timeout indicates that you could not even start to talk to the archive server, i.e. the phone just rings and rings and nobody picks up. This differs from a connection reset (the topic of this issue), which is the server slamming down the phone in the middle of your conversation.
I see that you are trying to connect to port 7480. Are you on a machine in the CHPC cluster? If not, you'll need to connect to port 443, aka https, and use a token as provided by the RDB link button on the archive. So instead of
d = katdal.open('http://archive-gw-1.kat.ac.za:7480/1596945366/1596945366_sdp_l0.full.rdb')
try
d = katdal.open('https://archive-gw-1.kat.ac.za/1596945366/1596945366_sdp_l0.full.rdb?token=<your-token>')
I managed to download dump 168 with both methods just now, so the server is up and your dataset is intact. That's the good news 😄
This issue also occurs if you download the RDB file to your local disk and then open it via
d = katdal.open('1596945366_sdp_l0.full.rdb')
That trick only works on the CHPC cluster, or if you also copied all the data to your local disk, since the RDB file only contains the 7480 URL and won't know about the token.
Ah, interesting. Yeah, that works. Thanks very much.
Pleasure!
I now remember that there's another option with the local RDB file to feed in the token - treat it like an URL:
d = katdal.open('1596945366_sdp_l0.full.rdb?token=<your-token>')
Although I'm not sure if that will go the https route...
I'm using katdal 0.15 in an ubuntu 18.04 docker container