obspy / obspy

ObsPy: A Python Toolbox for seismology/seismological observatories.
https://www.obspy.org
Other
1.15k stars 530 forks source link

http.client.IncompleteRead: IncompleteRead(145883 bytes read) #3028

Closed dip16gphy closed 2 years ago

dip16gphy commented 2 years ago

Avoid duplicates

Bug Summary

In python console, whenever I request for data with minmag less than 7, it throws this error. But when I use minmag 7 or greater than that, it works fine. I tried updating obspy too. Plz help. image

Code to Reproduce

from obspy.clients.fdsn import Client
client = Client()
cat=client.get_events(starttime='2013-07-01', endtime='2599-12-31', minmagnitude=6)
print(cat)

ObsPy Version?

1.3.0

Operating System?

windows

Python Version?

3.8.12

Installation Method?

conda

calum-chamberlain commented 2 years ago

I do not get an error when I try this on my machine. It is likely that this is a network connection related issue and this may be transient. Please ensure that you are connected to a stable internet connection and try again. If this persists you might try downloading smaller chunks to try and avoid network limitations using something like:

from obspy import UTCDateTime, Catalog
from obspy.clients.fdsn import Client

client = Client()
cat = Catalog()
starttime, endtime = UTCDateTime(2013, 7, 1), UTCDateTime.now()
chunk_length = 365 * 86400  # Query length in seconds
while starttime <= endtime:
    cat += client.get_events(starttime=starttime, endtime=starttime + chunk_length, minmagnitude=6)
    starttime += chunk_length
filefolder commented 2 years ago

yes usually servers will often return an error when too much data is requested all at once, but this example is pretty normal (only 1252 events) and it works fine for me.

this looks to be some sort of binary encoding issue that i see a lot when (poorly) porting py2 to py3 code. hopefully simply updating/rebuilding your rfpy conda will solve it.

megies commented 2 years ago

example above runs fine for me too

wasjabloch commented 2 years ago

The issue may indeed be related to a bad internet connection, as it appears to me as if the call was made from Haiti.

I patched the Error doing:

try:
    cat = event_client.get_events(starttime=tstart, endtime=tend)
except IncompleteRead:
    chunk = 365 * 86400  # Query length in seconds
    cat = Catalog()
    tend = min(tend, UTCDateTime.now())
    while tstart < tend:
        cat += event_client.get_events(starttime=tstart, endtime=tstart + chunk)

        # Make sure that we go all the way to tend
        if tstart + chunk > tend:
           chunk = tend - tstart
            # But do not get caught up in an infinite loop due to rounding errors
            if chunk <= 1:
                break

        tstart += chunk

which solved the Issue.

It would be cool if get_events could do this internally.

Anyone interested in implementing this?

filefolder commented 2 years ago

since it's a data volume issue it wouldn't necessarily be fixed just by reducing the time window; also proportional to the widow of magnitudes, distances, depths, etc etc.

it also seems like more of a user issue than an obspy issue. could just print a warning "try reducing size of request" etc

megies commented 2 years ago

it also seems like more of a user issue than an obspy issue. could just print a warning "try reducing size of request" etc

I agree, I'm quite sure it isn't something we want to add. In general you want requests to data centers to be big to be efficient. I'm also not sure if we need to show an additional warning. If we'd want to do that, we could do the same in dozens of parts of the code base and it just seems like a futile effort that blows up the code base.

I'm closing this for now. Totally feel free to reopen if anybody disagrees and wants to discuss more.

wasjabloch commented 2 years ago

I understand that the solution offered above would make the code a lot harder to maintain and that calls to data centers should be large. I also see that it is somewhat unsatisfactory to have a bug one cannot really test, because it is so transient.

However, I would like to make the point that this bug essentially breaks obspy for people behind poor internet connections, as downloading catalogs is a core-functionality of obspy. But people should have a chance to make use of this functionality, even if they sit in your favorite very remote location and their internet is slow. It appears to me as if

from http import IncompleteRead
...
try:
    reader = url_obj.read()
except IncompleteRead:
    msg = 'Problem retrieving data from datacenter. Try reducing size of request.'
    raise RuntimeError(msg)
buf = io.BytesIO(reader)
...

would already help the affected user a lot. In my opinion this effort would totally not be futile.

I've looked a little bit into the code base and agree that it is somewhat difficult to understand which other calls might potentially raise the same issue. The entire block of code already looks a little messy:

if debug is True:
    print("Uncompressing gzipped response for %s" % url)
# Cannot directly stream to gzip from urllib!
# http://www.enricozini.org/2011/cazzeggio/python-gzip/
buf = io.BytesIO(url_obj.read())
buf.seek(0, 0)
f = gzip.GzipFile(fileobj=buf)

In an attempt to find a more general solution, I here note that the url_obj.read() causes the problem, which is a url_obj = opener.open(request), which is a whole world by itself. Client.opener already has quite sophisticated error handling routines here and there and I wonder if the above test could be plugged in somewhere where it really belongs. Unfortunately, I do not see where this could be.

P.S.: Only contributors can re-open.

megies commented 2 years ago

somewhat unsatisfactory to have a bug one cannot really test, because it is so transient.

it's not a bug, it's a network issue.

However, I would like to make the point that this bug essentially breaks obspy for people behind poor internet connections, as downloading catalogs is a core-functionality of obspy. But people should have a chance to make use of this functionality, even if they sit in your favorite very remote location and their internet is slow. It appears to me as if would already help the affected user a lot. In my opinion this effort would totally not be futile.

Fair enough, we can add that try/except, send a PR.

The entire block of code already looks a little messy:

Not sure how this is messy, it even has a comment explaining why this is done and going through a BytesIO instance isn't anything unusual, that's best practice as a workaround right there.

I here note that the url_obj.read() causes the problem, which is a url_obj = opener.open(request), which is a whole world by itself.

That read is what gets interrupted by the flunky internet connection. That opener is what's handling the authentication, requests is doing an amazing job of having a great high-level API for this, you should've seen how this had to be done before requests was a thing. It was ugly.

P.S.: Only contributors can re-open.

Become a contributor then, send that PR :wink:

megies commented 2 years ago

Please leave the type of exception though, it's not a RuntimeError. It should stay IncompleteRead or at least it's parent class HTTPException. Runtime error doesn't tell you anything, the original type already tells you it's a http issue.

wasjabloch commented 2 years ago

Okay. Thanks for the insight regarding opener and requests. I just wondered if other Clients, Client.get_stations(), or Client.get_waveforms() are affected by the same network issue. If that opener.open(request).read() gets called in other places, I wasn't able to locate it, so I assume it doesn't. A more general warning to the user would probably except the IncompleteRead in some central wrapped class method. For now I assume it is the only occurrence of that particular call.

I read and understood the contributor's guidelines and will create that Pull Request in due time. For now, this would be from a personal fork. To become a contributor, someone of the developers team needs to grant me contributor privileges beforehand, don't they? I think I can't do this by myself.

megies commented 2 years ago

wondered if other Clients, Client.get_stations(), or Client.get_waveforms() are affected by the same network issue

Yes and no. They are affected, but it's the same piece of code, all downloading of URL's (in the FDSN client) is handled in that one method, finally. Not sure about the EIDA routing client and the IRIS federator, and there's certainly many more places in obspy where long network reads could be problematic for slower network connections that are prone to spurious interruptions.

I read and understood the contributor's guidelines and will create that Pull Request in due time.

sounds good :+1: Like I said, I'd propose to except IncompleteRead as e: and then simply append another sentence or two with more info like you proposed, adding to the actual exception's message and then reraise the exception, so that the original exception stays the same, just with more info.

For now, this would be from a personal fork. To become a contributor, someone of the developers team needs to grant me contributor privileges beforehand, don't they? I think I can't do this by myself.

Thats fine. If you want though, I can also add you to developers right away and you can push a branch to the main repository. It's fine either way.