Closed ndestefano closed 8 years ago
Hi @ndestefano. I'm not sure why you are having the problems you are having.
Just to clarify, when did this problem start happening for you? Did it just recently start and you were able to download data with no problems before?
If everything was working before and then suddenly stopped working, then it could be an IP issue. If it never worked for you, then perhaps it's something to do with davitpy.
Thanks in advance for the clarification.
Hi @asreimer , the problems seem to be intermittent, and I mostly do these tests early in the day (around 10:00AM EDT). I should also clarify that I was able to get file transfer working through davitpy initially and it appeared to have no issue, although I may have had a file here or there that didn't transfer correctly.
For instance I tried to connect to the ftp server via sftp and got a similar error (Connection reset by peer), but then I tried again about 1/2 hour later and connected just fine.
One last thing that I should mention is that I am going through a proxy when I download the data, I'm not sure if that makes a difference or not.
Sounds to me like a network issue. If you can (if you have davitpy on a laptop), you should try running your code on a completely different network. How many instances of this are you running at the same time? Trying to download too many files in parallel could be causing bandwidth problems. AFAIK, the data server at VT isn't necessarily designed to handle very many concurrent connections.
Maybe @ksterne can have a look at the file server logs (is the data server just getting overwhelmed by all the connections?).
You could try pinging the data server and seeing what the latency is like when you are having issues. Do you have trouble accessing any websites at the same time?
@asreimer got to this before I could. I do have a limit of 10 logins, so if you're doing things in parallel, you could be hitting an issue there. If you're able to get back to things, then you're probably not on the blacklist for it. If you still can't get in, let me know what IP address you're coming from and I can remove it off of the list.
As well, as mentioned in other posts ( #191 is the main one), davitpy isn't really setup to do large data pulls. I'm running something similar which is coming through lots of the SuperDARN data to look for something very small. Any generally for one month, the code stops working a few times before it makes it through all of our radars (how many times depends on how many radars are running for a given month).
There ya go @ndestefano, you might be running into that 10 login limit (since that would include 10 total logins, not 10 per use right?).
The problem described in #191 won't cause these SSH errors @ndestefano was seeing though.
So right now the ping to the data server is ~50ms with 1-3% packet loss over the course of one minute. I did ping google at the time I was having major connection issues and that was fine, so it wasn't on my end.
The 10 logins at once was probably the issue, since I was running anywhere between 8-32 concurrent requests. I did notice a potential bug in my own code that may not have respected this limit so my worry was that I was blacklisted.
I do (unfortunately) have to pull large amounts of data (although time isn't really an issue), is there a general strategy for doing this? Should I just run batch jobs at night and make sure I obey the 10 login limit?
Just as a quick status, I can't connect to via sftp or using the davitpy code on my laptop with "connection reset by peer" errors (I have been able to do both previously), my ip is 129.83.31.1
On my cluster I have an sftp file transfer going and I can make other sftp connections and data transfer through davitpy, which is basically the exact opposite problem that I was having this morning. I don't know what the ip is of the cluster but I'm currently downloading 2104 fitex data from radar 'rkn'.
@ndestefano, is this the public ip address? If so, I don't see it listed on the sd-data1 sftp server. I'm slightly confused here since you note that you're getting "connection reset by peer" errors, but then also mention you're transferring data through davitpy. Is this data transfer with davitpy connecting to sd-data1.ece.vt.edu?
So I have two machines where I have davitpy installed. One is on my laptop (Mac OSX) and the other is on a Cluster which is running CentOs. With both in the past I've been able to establish connections either through the davitpy interface (davitpy.pydarn.sdio.radDataOpen() and davitpy.pydarn.sdio.radDataReadRec() ) or by using sftp. The ip that I gave is for my laptop, and the address that I was using for sftp is is sd-data.ece.vt.edu.
This morning I was having connection issues on the cluster but my laptop was able to make connections, and now the opposite is true. The download that I have that's in progress is on the cluster machine, not my laptop.
I hope that clears things up.
@ndestefano, OK that clears things up. Sorry I wasn't following that you were using things from two different computers. So, I'm guessing things are mostly working for you? It sounds like there is some kind of issue with the connection there then anything on the sd-data server.
Aside from the example I mentioned about combing through data, I'm not sure anyone has used davitpy just to download data. Maybe @asreimer, has tried to do statistical studies with davitpy and lead to the issue #191. So, I believe you're on your own here?
I haven't tried to do statistical studies by downloading data from VT. We have our own data server here in ISAS.
What I recommend doing is downloading rawacf data from the distribution servers we have set up (they are designed to handle massive downloading of data) and then process that data into fitacf using RST. @ksterne, perhaps this is a good example of a reason why we should have fitacf data available on the data mirrors?
So I checked the connections this evening and I was able to connect with my laptop and on the cluster just fine using either sftp or the davitpy code. Was it just a traffic issue that was causing disconnects? Should I just run in the evening and this won't happen as often?
@asreimer how do I access the distribution servers?
Hey @ndestefano, sorry about the delayed response.
To access the distribution servers you need to get in contact with someone in the Data Distribution Working Group.
Hey all,
First off I want to say thanks for all of the work that's been done in making this tool. I'm currently using your code to pull large amounts of data for clutter studies and I've been having some timeout/connection issues.
I've been downloading chunks of data for a given year and a given radar with a script that I wrote using the functions davitpy.pydarn.sdio.radDataOpen() and davitpy.pydarn.sdio.radDataReadRec(). This script is multi-threaded (well, actually multi-processed) because there's some amount of data processing that goes on before saving to my output file. At first I was seeing timeout issues (maybe there was a lot of activity on the server) so I found that I could extend the timeout amount that paramiko uses since timing wasn't an issue, then I got the following error:
ERROR:paramiko.transport:Traceback (most recent call last): ERROR:paramiko.transport: File "~/.local/lib/python2.7/site-packages/paramiko/transport.py", line 1710, in run ERROR:paramiko.transport: self._check_banner() ERROR:paramiko.transport: File "~/.local/lib/python2.7/site-packages/paramiko/transport.py", line 1858, in _check_banner ERROR:paramiko.transport: raise SSHException('Error reading SSH protocol banner' + str(e)) ERROR:paramiko.transport:SSHException: Error reading SSH protocol banner[Errno 104] Connection reset by peer ERROR:paramiko.transport: ERROR:root:can't connect to sd-data.ece.vt.edu with username and password ERROR:root:Sorry, we could not find any data for you :( ERROR:root:Your pointer does not point to any data
My worry at this point was that my aggressive downloading got me ip banned or something (I hope not!), however when I look at the ftp site it seems like there some gaps in data, so is the connection reset if no data is found? The only reason why I don't completely trust the "we could not find any data for you" error is that there are some days that it looks like I've missed and yet it appears that there's data as shown on the ftp site.
Also somewhat related, I don't want to overload servers or anything like that, so is there an amount of requests that I could limit myself to (say 10 or 20 at a time)?
Thanks again for everything,