kartaview / upload-scripts

Uploader tools for KartaView
MIT License
61 stars 30 forks source link

Infinite retry loop when resuming an interrupted upload #21

Closed ToeBee closed 5 years ago

ToeBee commented 7 years ago

After a network glitch in the middle of a large upload I had to CTRL-C the upload script. Upon resuming the upload, some of the files had been successfully uploaded to the server even though the completion of the upload had not been recorded by the upload script yet. This led the server to return a non-200 status code because the file already existed in the sequence. But this made the script retry the upload for the same file again. And again. And again...

I didn't get a chance to debug exactly what the server was sending back when it detected a duplicate image. I actually made a mistake when adding a print statement and the error being thrown by my mistake broke out of the infinite upload loop and the upload continued on and finished successfully. So that was amusing. But I seem to recall the server returning a specific error message on duplicate upload. If this error is detected, the upload script should mark the file as finished instead of trying to upload it again.

ToeBee commented 7 years ago

I suppose I should mention that this is in the upload_photos_by_exif script.

I was looking at this some more tonight and there is actually code to handle duplicate uploads (apiCode: 660) on line 160 but it is never being reached because of the non-200 HTTP status code being handled up on line 131 which is where the infinite loop happens so the apiCode of 660 never makes it out of the upload_photos function to be handled correctly.

So there is error handling that is getting in the way of other error handling :)

bogdan-racasan commented 7 years ago

That loop was created so there is no jump trough photos. I will investigate why is the response not 200. There were some changes with the names. After the name will change i will change the requests. Ty for using and reporting 👍

danbjoseph commented 7 years ago

I encountered what seems to be this same issue.

screen shot 2017-05-07 at 9 33 58 am
bogdan-racasan commented 7 years ago

@danbjoseph can you provide a sample of images to test on . Thanks for reporting and helping us to improve OSM 👍

danbjoseph commented 7 years ago

i added 2 additional lines to the count file with the next sequential numbers and on the next attempt it worked.

bogdan-racasan commented 7 years ago

Hmm ... interesting. But it's nice that work for you now

danbjoseph commented 7 years ago

happened again and i couldn't get it to work using the same hack as last time. i've uploaded the folders here

bogdan-racasan commented 7 years ago

Oke 👍 i will investigate this issue ASAP. Thanks

danbjoseph commented 7 years ago

http://openstreetcam.com/details/396680/0 is a sequence that got interrupted during upload, and then i added a couple of lines to the count_file.txt

bogdan-racasan commented 7 years ago

No the server don't know how many images you want to upload. He only know when you stop the upload with a finish request. Also you can upload images with longer distance within them but this is not a good idea 👍

Also only yesterday the folder you uploaded has finished to download. Are you still having problems? @danbjoseph

danbjoseph commented 7 years ago

looking at my profile page, the sequences in the folders (with the download link above) that i was having an issue with are no longer marked as 'uploading' but i didn't do anything. does the server eventually close out a sequence if it has been a long enough period with no additional pictures and no finish request?

ToeBee commented 7 years ago

This isn't hard for me to reproduce. I just CTRL-C'd a sequence upload right as it was finishing an image. Now when I try to restart it, it goes into the retry loop. Given my slowish residential internet, there is probably a bigger time gap between sending the last packet of a picture and receiving the ACK back.

But regardless, the problem is still exactly what I described up in the first comment from November 28th. When the script tries to upload a duplicate image, the server returns an HTTP 400 response. This triggers the while int(conn.status_code) != 200: on line 130 where it just sits trying to upload the same image over and over because the server keeps returning HTTP 400.

ToeBee commented 7 years ago

Alternatively, here is a guaranteed way of reproducing it: 1) Ctrl-C the upload in the middle of a sequence upload 2) Open count_file.txt and remove the last line 3) Restart the upload

This simulates the last image being fully uploaded to the server but not yet being recorded as finished by the client which is what happens sometimes on connection failures.

ToeBee commented 7 years ago

The last two nights in a row now I have left 3 uploads in progress when I go to bed. Both nights all 3 uploads failed at the same time and I woke up to finding them spitting out infinite retries. Retry attempt : 43310

dalacost commented 7 years ago

i get the same problem several time when uploading a lot of old images. my temporal solution was, delete the sequence from web, delete two .txt files and start upload again. :(

ToeBee commented 7 years ago

You can avoid re-uploading the whole thing by opening the count_file.txt and appending numbers to the end of it until the last number equals the number of pictures the website says are already uploaded if you view your user profile page.

I'm not sure how but a few times the numbers have been off by as much as 8. I could understand 4. That means the script fired up 4 upload threads (which it does by default), successfully uploaded all 4 files but never recorded the success because the server or network errored out before the response came back. But I'm not sure how it could start up 4 new upload threads without recording that the previous 4 succeeded.

ToeBee commented 7 years ago

Has anyone looked at this some more? I mean I'm sure patches are welcome but this problem kind of cuts through some core error handling in the script and I haven't taken the time to fully grasp it all. I was hoping with a sure-fire way of reproducing it posted above, someone would take another look at it.

A couple weeks ago I was hitting this problem on a daily basis while trying to upload a lot of large sequences from my bicycle tour in Wyoming. It seems the OpenStreetCam servers aren't as stable as hoped and every night one or two uploads would error out on me. I ended up getting fed up with it and stopped uploading from home and took the pictures to work with me to upload from the fast connection there.

bogdan-racasan commented 7 years ago

Hi guy's. I made an fix for this issue with this refactor. I think the problem was when the requests were made for the same image on the same trip. I didn't check this problem... I hope for now this is fixed.

ToeBee commented 6 years ago

Haven't uploaded in a while. I just tried last night. One sequence hit an error of some kind. Unfortunately my scrollback buffer doesn't show the beginning of the problems but it did get into a loop of some kind. The sequence did finish uploading though. This morning this was in the console:

...
Retry attempt : 8765
Retry attempt : 8766
Retry attempt : 8767
Retry attempt : 8768
Retry attempt : 8769
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 376, in _make_request
    httplib_response = conn.getresponse(buffering=True)
TypeError: getresponse() got an unexpected keyword argument 'buffering'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 559, in urlopen
    body=body, headers=headers)
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 378, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.5/http/client.py", line 1197, in getresponse
    response.begin()
  File "/usr/lib/python3.5/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.5/http/client.py", line 258, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.5/socket.py", line 575, in readinto
    return self._sock.recv_into(b)
ConnectionResetError: [Errno 104] Connection reset by peer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/adapters.py", line 376, in send
    timeout=timeout
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 609, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/packages/urllib3/util/retry.py", line 247, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/packages/urllib3/packages/six.py", line 309, in reraise
    raise value.with_traceback(tb)
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 559, in urlopen
    body=body, headers=headers)
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 378, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.5/http/client.py", line 1197, in getresponse
    response.begin()
  File "/usr/lib/python3.5/http/client.py", line 297, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.5/http/client.py", line 258, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.5/socket.py", line 575, in readinto
    return self._sock.recv_into(b)
requests.packages.urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/photo.py", line 79, in __call__
    self.make_upload(photo, data_photo)
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/photo.py", line 67, in make_upload
    conn = requests.post(self.url_photo, data=data_photo, files=load_photo, timeout=1000)
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/api.py", line 107, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/api.py", line 53, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/sessions.py", line 468, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/adapters.py", line 426, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "upload_exif.py", line 67, in <module>
    main()
  File "upload_exif.py", line 58, in main
    do_upload(thread, UploadPhoto(path, access_token, id_sequence, count_list, url_photo), photos_path)
  File "/home/toby/git/osv-uploader/osc/utils.py", line 68, in do_upload
    return p.map(generator, payload)
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 260, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 608, in get
    raise self._value
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

However another sequence got stuck in the middle. There was no error, it just stopped at picture 184 out of 1213. It's still just sitting there. I'm going to have to ctrl-C it. Is there a timeout setting that needs to be set somewhere?

bogdan-racasan commented 6 years ago

@ToeBee Can you upload the images in some dropbox. I've tested the new script with all images that was on my machine. However there can be some problems. Thanks for reporting and upload images to improve osm

bogdan-racasan commented 6 years ago

However all upload this the new scripts are having this issue ?

ToeBee commented 6 years ago

I just had an upload fail with a connection timeout error. The script crashed out however it did resume correctly. I also played around with my test plan above (deleting lines from count_file.txt) and it does seem like the infinite retry loop on resume is fixed!

ToeBee commented 6 years ago

I spoke too early. An upload just failed and went into a retry loop. The error printed on the console isn't very helpful:

Request/Response fail
Retry attempt : 0
Retry attempt : 1
Retry attempt : 2
Retry attempt : 3
Retry attempt : 4
Retry attempt : 5
Retry attempt : 6
Retry attempt : 7
Retry attempt : 8
Retry attempt : 9
Retry attempt : 10
Retry attempt : 11

However I CTRL-C'd the script and restarted it and it seems to be working again. So it recovers well if the script is restarted but not if there is an error while running. Also, what is the intended behavior for errors in general? This one produced an infinite retry loop but I got a connection time out error earlier that crashed the whole script.

The connection timeout left a trace in the console:

  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 559, in urlopen
    body=body, headers=headers)
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 353, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.5/http/client.py", line 1106, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python3.5/http/client.py", line 1151, in _send_request
    self.endheaders(body)
  File "/usr/lib/python3.5/http/client.py", line 1102, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python3.5/http/client.py", line 936, in _send_output
    self.send(message_body)
  File "/usr/lib/python3.5/http/client.py", line 908, in send
    self.sock.sendall(data)
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/adapters.py", line 376, in send
    timeout=timeout
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 609, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/packages/urllib3/util/retry.py", line 247, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/packages/urllib3/packages/six.py", line 309, in reraise
    raise value.with_traceback(tb)
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 559, in urlopen
    body=body, headers=headers)
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/packages/urllib3/connectionpool.py", line 353, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.5/http/client.py", line 1106, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python3.5/http/client.py", line 1151, in _send_request
    self.endheaders(body)
  File "/usr/lib/python3.5/http/client.py", line 1102, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python3.5/http/client.py", line 936, in _send_output
    self.send(message_body)
  File "/usr/lib/python3.5/http/client.py", line 908, in send
    self.sock.sendall(data)
requests.packages.urllib3.exceptions.ProtocolError: ('Connection aborted.', TimeoutError(110, 'Connection timed out'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/photo.py", line 79, in __call__
    self.make_upload(photo, data_photo)
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/photo.py", line 58, in make_upload
    conn = requests.post(self.url_photo, data=data_photo, files=load_photo, timeout=1000)
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/api.py", line 107, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/api.py", line 53, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/sessions.py", line 468, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/requests/adapters.py", line 426, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', TimeoutError(110, 'Connection timed out'))
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "upload_exif.py", line 67, in <module>
    main()
  File "upload_exif.py", line 58, in main
    do_upload(thread, UploadPhoto(path, access_token, id_sequence, count_list, url_photo), photos_path)
  File "/home/toby/git/osv-uploader/osc/utils.py", line 67, in do_upload
    list(tqdm.tqdm(p.imap(generator, payload), total=len(payload), bar_format='{l_bar}{bar} {n_fmt}/{total_fmt} remaining:{remaining}  elapsed:{elapsed}'))
  File "/home/toby/git/osv-uploader/upload_photos_by_exif/local/lib/python3.5/site-packages/tqdm/_tqdm.py", line 872, in __iter__
    for obj in iterable:
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 695, in next
    raise value
requests.exceptions.ConnectionError: ('Connection aborted.', TimeoutError(110, 'Connection timed out'))
bogdan-racasan commented 6 years ago

And i was so happy that this issue was solved :))

bogdan-racasan commented 6 years ago

I will try to repoduce this error. Do you have the images with this problem?

ToeBee commented 6 years ago

Well the upload just finished so you have the images now :) But I really don't think it has to do with the image files. The script just doesn't handle all types of network/sever related errors correctly.

I just got the script to crash by starting an upload and then unplugging the network cable. After 5 minutes nothing had happened so I plugged it back in. After the computer got back on the network, the script errored out with the error requests.exceptions.ConnectionError: ('Connection aborted.', BrokenPipeError(32, 'Broken pipe')) even though the network was available again and the upload could have been resumed.

bogdan-racasan commented 6 years ago

I'm sure that the images are not the problems. But i need some images to test...

bogdan-racasan commented 6 years ago

Thanks for helping. You are awesome.

ToeBee commented 6 years ago

I added the HTTP status code and API message to the information printed out when there is an error on the initial upload. It seems the error that is causing the script to go into an infinite retry loop is returning an HTTP 400 error from the server with the message A required argument is missing (photo)

I have no clue what would be causing the request to go through without the photo data. Or if there is a problem on the server that is causing this.

rwelty1889 commented 6 years ago

i just saw this error for the first time:

Request/Response fail
Retry attempt : 0 Retry attempt : 1 Retry attempt : 2 Retry attempt : 3 Retry attempt : 4 Retry attempt : 5 Retry attempt : 6 Retry attempt : 7 Retry attempt : 8 Retry attempt : 9 Retry attempt : 10 Retry attempt : 11 Retry attempt : 12 Retry attempt : 13 Retry attempt : 14 Retry attempt : 15 Retry attempt : 16 Retry attempt : 17 Retry attempt : 18 Retry attempt : 19 Retry attempt : 20 Request/Response fail Retry attempt : 0 Retry attempt : 21 Retry attempt : 22 Retry attempt : 1 Request/Response fail Retry attempt : 0 Retry attempt : 1 Retry attempt : 23 Retry attempt : 2 Retry attempt : 24 Retry attempt : 3

rwelty1889 commented 6 years ago

and i see that there is no count_file.txt in the directory at all, so i cannot modify it to try and resolve the problem.

rwelty1889 commented 6 years ago

and it turns out that sequence_file.txt is there but zero length.

rwelty1889 commented 6 years ago

no evidence of this upload on my profile page, so this is pretty thoroughly borked.

ToeBee commented 6 years ago

Add a line to the photo.py file that prints the server response on upload fail. The script just gives a useless "fail" message while hiding the server response that may contain useful error messages. Around line 60 make it look like this:

    def make_upload(self, photo, data_photo):
        load_photo = {'photo': (photo, open(self.path + photo, 'rb'), 'image/jpeg')}
        conn = requests.post(self.url_photo, data=data_photo, files=load_photo, timeout=1000)
        if int(conn.status_code) != 200:
            print("Response body: " + conn.text) # ADD THIS LINE
            if conn.json()['status']['apiMessage'] == ' You are not allowed to add a duplicate entry (sequenceIndex)':
...

Also, if there is no count_file.txt, check for a... I think it is sequence_file.txt maybe? If that exists, try deleting it. That will cause it to create a new sequence to upload to instead of trying to add files to an existing one that may have some kind of problem with it.

rwelty1889 commented 5 years ago

if count_file.txt and sequence_file.txt are removed, it skips over the previously uploaded images and continues. not yet sure what actually happens to the previously started upload. will know shortly.

rwelty1889 commented 5 years ago

and the answer is that a new upload starts. if i do nothing else, i guess the original upload will be parked forever. i am considering deleting sequences from the web interface and re-uploading.

rwelty1889 commented 5 years ago

i also changed the upload by exif script to use only one thread and have had no crashes since.

rwelty1889 commented 5 years ago

interesting - in a directory where the upload fell into the retry loop, if i set upload threads to 1 and then restart, it did pickup from where it left off and is currently completing the upload, apparently without loss. the one that lost the original uploaded images is the one where count_file and sequence_file were corrupted; removing them and restarting only uploaded the previously unuploaded images.

Approksimator commented 5 years ago

This error constantly appears. Set upload threads 1 to 10. The error appears when there are more than 100 files in the sequence. There are a lot of sequences on the site OSC because of this during the uploading process.

bogdans-telenav commented 5 years ago

@Approksimator thanks for reporting. We are currently woking on creating a new version of the scripts that will handle your problem.

bogdans-telenav commented 5 years ago

Finally done it! Thanks for your patience! If you find any problem with the new version please create an issue and we will address it as soon as possible.