kartaview / upload-scripts

Uploader tools for KartaView
MIT License
61 stars 30 forks source link

One picture mistakenly rejected as a duplicate #10

Closed ToeBee closed 8 years ago

ToeBee commented 8 years ago

On some of my uploads I'm seeing the following output from the script:

No sequence file existing
Found 447 pictures to upload
processing G0013736.JPG
skipping - duplicate image
processing G0013738.JPG
Uploaded - 2 of total :447, percentage: 0.45%
processing G0013739.JPG
Uploaded - 3 of total :447, percentage: 0.67%
processing G0013737.JPG
Uploaded - 4 of total :447, percentage: 0.89%
processing G0013741.JPG
Uploaded - 5 of total :447, percentage: 1.12%
processing G0013742.JPG

When this particular upload completed, it only contained 446 images. Here is picture 1 of the sequence: http://openstreetview.com/details/11477/1

If you try to go to picture 0, it forwards to 1.

The first picture is about 20 meters north of the second picture and has a different file name. I'm not sure if this is the upload script or the server responding with an error.

ToeBee commented 8 years ago

I added some more print statements in the place where it prints out the duplicate image message. The data variable which I believe contains the response from the server is:

{'status': {'apiMessage': ' You are not allowed to add a duplicate entry (sequenceIndex)', 'httpMessage': 'Bad Request', 'apiCode': '660', 'httpCode': 400}}

But according to the file names in the output, it is uploading different files. So it seems like either there is some kind of off-by-one error in the upload script that causes the same contents to be uploaded twice at the beginning of the sequence or the server is mistakenly flagging images as duplicates somehow.

ToeBee commented 8 years ago

I added more debugging during the upload process. I printed out both the photo and photo_data variables on line 113. Here is the result:

No sequence file existing
Found 522 pictures to upload
photo:  {'photo': ('G0014448.JPG', <_io.BufferedReader name='/G0014448.JPG'>, 'image/jpeg')}
data_photo:  {'sequenceIndex': 0, 'coordinate': '38.65599554001036,-96.48843597000031', 'headers': 59.11930483795209, 'sequenceId': '11489'}
photo:  {'photo': ('G0014449.JPG', <_io.BufferedReader name='/G0014449.JPG'>, 'image/jpeg')}
photo:  {'photo': ('G0014450.JPG', <_io.BufferedReader name='/G0014450.JPG'>, 'image/jpeg')}
photo:  {'photo': ('G0014452.JPG', <_io.BufferedReader name='/G0014452.JPG'>, 'image/jpeg')}
data_photo:  {'sequenceIndex': 3, 'coordinate': '38.656040309998644,-96.48828589000176', 'headers': 92.21664626682987, 'sequenceId': '11489'}
data_photo:  {'sequenceIndex': 1, 'coordinate': '38.656004559988375,-96.48839636003116', 'headers': 67.82361963190183, 'sequenceId': '11489'}
data_photo:  {'sequenceIndex': 2, 'coordinate': '38.656011709996626,-96.48835445003417', 'headers': 71.5006605019815, 'sequenceId': '11489'}
photo:  {'photo': ('G0014453.JPG', <_io.BufferedReader name='/G0014453.JPG'>, 'image/jpeg')}
processing G0014452.JPG
data_photo:  {'sequenceIndex': 4, 'coordinate': '38.6559758199916,-96.48820831999146', 'headers': 147.42390369733448, 'sequenceId': '11489'}
Uploaded - 1 of total :522, percentage: 0.19%
photo:  {'photo': ('G0014454.JPG', <_io.BufferedReader name='/G0014454.JPG'>, 'image/jpeg')}
processing G0014448.JPG
data_photo:  {'sequenceIndex': 5, 'coordinate': '38.655864060051144,-96.48816235003925', 'headers': 154.21813031161474, 'sequenceId': '11489'}
data:  {'status': {'httpMessage': 'Bad Request', 'apiMessage': ' You are not allowed to add a duplicate entry (sequenceIndex)', 'httpCode': 400, 'apiCode': '660'}}
name:  G0014448.JPG
skipping - duplicate image

The error message seems to indicate that there is a duplicated sequenceIndex being sent with one of the pictures but I'm not seeing any evidence of this in the output. Maybe there is some threading problem that is changing something being sent across the wire at the last second?

Or what happens on the server if (as in this case) the picture 0 is not the first picture to finish uploading? Could that mess with something?

ToeBee commented 8 years ago

It is looking to me like this is a server-side problem. I just did some packet captures and everything looks to be in order on my end of the wire. All the uploads have a unique sequenceIndex and I can see the file name in the EXIF data being uploaded. The order of the file names matches the expected sequence index number. The server just seems to be incorrectly returning the duplicated error on the first file uploaded.

As I said on the first comment, it isn't happening every time either. But often enough that I was able to reproduce it for my tests.

bogdan-racasan commented 8 years ago

Can you send me a link with o couple of photos so i can make a test with them. Thank's for reporting

ToeBee commented 8 years ago

I am now pretty certain that the cause of this error is a picture other than picture 0 being finished uploading before picture 0 finishes. At work I wasn't seeing it happen nearly as much as at home. But now I made a test sequence where the 2nd file is tiny (220 KB instead of 2 MB) and now it happens every time. You can grab this test sequence here: https://dl.dropboxusercontent.com/u/1475575/OSM/test_sequence.zip

ToeBee commented 8 years ago

... Which I suppose means that this issue should actually be moved to the openstreetview.org repository since it seems to be a bug on the server side, not the upload script.

bogdan-racasan commented 8 years ago

This issue was moved to openstreetview/openstreetview.org#43

bogdan-racasan commented 8 years ago

@ToeBee i've tested this zip that you create but it work's fine ...

ToeBee commented 8 years ago

Oh, it is probably ordering it wrong for you. I had to modify the script to order by file name. Ordering by mtime would put the tiny file at the very end of the upload. So either switch the script to file name ordering or use touch to change the mtime on the files. Either way, make sure that G0016050 is the second file to be uploaded.

bogdan-racasan commented 8 years ago

@ToeBee We managed to solve the problem. Thanks for reporting. Explanation: When you upload using threading it's possible to upload first the photo with index 1,2,3... not the foto with index 0 so there is created the sequence. After that you try to upload the photo with index 0, at this point PHP recognized the 0 as Null so it's rejected. But we fixed the problem. Today we hope to make a deploy with this commit. Thanks again for helping.