zooniverse / panoptes-python-client

Apache License 2.0
30 stars 27 forks source link

Incorrect order when uploading subjects #206

Open yliefting opened 5 years ago

yliefting commented 5 years ago

When uploading subjects with multiple files the order seems to be lost sometimes. One example from our project: https://www.zooniverse.org/projects/y-dot-liefting/snapshot-hoge-veluwe/talk/subjects/26143281

We are using this client to upload subjects. The code that handles the uploads is:

for file in files:
  location = str(args['filepath'].replace("|", "/")) + str("/") + str(file) 
  subject.add_location(location)
subject.save()

I'm not sure, but could this be related to the async_save in https://github.com/zooniverse/panoptes-python-client/blob/master/panoptes_client/subject.py?

adammcmaster commented 5 years ago

This is very strange. Really we'll need to be able to reproduce this to figure out what's going on.

Would you be able to try creating a new subject with the same images as subject 26143281, to see if the same thing happens again? Just create a new subject set and upload it to there, so we can just delete it when we're done testing. Also if you can set the environment variable PANOPTES_DEBUG=true, then that will provide some extra log output that should be useful.

yliefting commented 5 years ago

I uploaded the same 10 photos again to another subject set (73342). The subject is 30841573. It does seem to happen again. The output from the client (which we run in a Docker container to interface with our management app) just says:

172.18.0.14 - - [27/Feb/2019 15:52:48] "POST /subject HTTP/1.1" 201 -

In parallel I will try to find out if the list of files gets shuffled somewhere before the panoptes client receives the files.

adammcmaster commented 5 years ago

Hmm, that doesn't look like the output I'm expecting. There should be several lines of log output, each starting with DEBUG. For example, this is the output when I create a subject with one image:

>>> s = Subject()
>>> s.links.project = 7
>>> s.add_location('/Users/adam/Desktop/image.png')
>>> s.save()
DEBUG:redo:attempt 1/5
DEBUG:redo:retry: calling save, attempt #1
DEBUG:urllib3.connectionpool:https://www.zooniverse.org:443 "POST /oauth/token HTTP/1.1" 200 1004
DEBUG:panoptes_client:json={'subjects': {'locations': ['image/png'], 'metadata': {}, 'links': {'project': 7}}}
DEBUG:redo:attempt 1/1
DEBUG:urllib3.connectionpool:https://www.zooniverse.org:443 "POST /api/subjects HTTP/1.1" 201 1942
DEBUG:redo:attempt 1/5
DEBUG:redo:retry: calling _upload_media, attempt #1
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): zooniverse-static.s3.amazonaws.com:443
DEBUG:urllib3.connectionpool:https://zooniverse-static.s3.amazonaws.com:443 "PUT /panoptes-uploads.zooniverse.org/production/subject_location/47974f29-4c51-4e76-9770-f3505b12be4c.png?...[snipped]... HTTP/1.1" 200 0
>>> 

Though actually, now that I think about it that won't really help shed any light on this because it doesn't explicitly say what order it's saving the images in the subject attributes. I was trying to see if the client was submitting them in the right order or if they were being shuffled somehow before that.

I'll probably need to add some additional logging to get to the bottom of this.

adammcmaster commented 5 years ago

I should also point out that you'll want to trim the URLs from the Amazon S3 PUT lines before posting – the tokens there do expire but not immediately, so I wouldn't post them publicly.