zooniverse / panoptes-python-client

Apache License 2.0
30 stars 27 forks source link

Connection issues when trying to upload subjects / looping over subject sets #198

Closed marco-willi closed 5 years ago

marco-willi commented 5 years ago

Hi @adammcmaster

A colleague of mine and I have recently been experiencing connection issues when uploading subjects using the panoptes-python-client. Uploading subjects fails regularly, i.e., seems to stall indefinitely at random times during the upload process. This has led to incomplete uploads of our subject-sets. We've re-factored our upload script such that we can "resume" uploading into the same set using these steps:

  1. Iterating over a specified subject-set to find all already uploaded subjects
  2. Start creating non-existent subjects in batches of 500 and then add to the set, repeat 2) until finished
  3. Start over with 1) on connection failure (manually)

Recently, we've had difficulty to even get past stage 1) -- iterating over the subject set. Essentially what we are doing for 1) is this:

my_project = Project(args['project_id']) 
my_set = SubjectSet().find(args['subject_set_id']) 
for i, subject in enumerate(my_set.subjects): 
    .....

This is the error I just got from looping over the set: PanoptesAPIException: Received HTTP status code 504 from API

We've been working on the "Cedar Creek" project (5880) which has attracted a lot of volunteers and is about to run out of data much faster than expected. When we uploaded the first batch of data before Christmas we've had the same issue for several days until it suddenly worked.

We've both been working from MSI (UMN super computing institute) but I've also experienced the issues from my home isp.

We've been using Python 3.5 / 3.6, with panoptes-python-client 1.0.1 and 1.0.3.

(full code if you're interested: Link)

camallen commented 5 years ago

This is a dup of known issues that are fixed in master but not released, specifically #189 and #191. The solution is to use the latest code from github and not the released version, https://github.com/zooniverse/panoptes-python-client#installation pip install -U git+git://github.com/zooniverse/panoptes-python-client.git

I'll let @adammcmaster speak more to best practice with 'resumption' and finding unlinked subjects. For ref I worked on something similar recently to resume uploads https://github.com/camallen/PRN-scripts/blob/f571de5a087320bde27047440765b74a7eb131f8/upload_manifest.py#L57

marco-willi commented 5 years ago

Thanks Cam! I've changed my "resuming" functionality according to your example, very neat. This will require less requests and thus mitigate connection issues. We've had also much less connection issues today (with the updated client) and were able to process a new chunk of data. Feel free to close the issue.