qiita-spots / qiita

Qiita - A multi-omics databasing effort
http://qiita.microbio.me
BSD 3-Clause "New" or "Revised" License
120 stars 80 forks source link

Add upload metadata or prep info from computer to update information dropdowns #2433

Closed adswafford closed 5 years ago

adswafford commented 6 years ago

To improve usability, users should have the option to select a file from their computer to upload and update the information rather than having to go to the 'Upload Files' page to add them first and then go select from the dropdown.

antgonza commented 6 years ago

I think this is already there, could you confirm it does what you are expecting? See: upload

josenavas commented 6 years ago

@antgonza we were talking about this during the meeting yesterday. The idea is that in the sample or prep template page, when the user can choose the file to update the metadata, they have another option on the dropdown ("choose from your computer") or something along those lines, which then allows them to upload a file. This is a more natural interaction, and given that those files are small enough it should not generate any problem. I mentioned during the meeting that the drawback of that approach is that the file will not get stored in the system if the update fails, but that didn't seem to concern any of the present people there.

antgonza commented 6 years ago

Got it, thanks for the extended explanation.

antgonza commented 6 years ago

Note that a similar functionality has been added for the scp/sftp pulls and we could reuse that code. Note that we are restricting the size of the file client-side so users don't try to upload something crazy. Thus, decided to check the current sample info file sizes in Qiita:

$ ls --sort=size -lah [0-9][0-9][0-9]_[0-9]* | head -n 2
-rw-rw-r--+ 1 qiita qiita  2.3M Aug 18  2015 923_20150818-225146.txt
-rw-rw-r--+ 1 qiita qiita  2.3M Jun 23  2015 923_20150623-112619.txt
$ ls --sort=size -lah [0-9][0-9][0-9][0-9]_[0-9]* | head -n 2
-rw-rw-r--+ 1 qiita qiita 3.5M Aug 18  2015 1841_20150818-225340.txt
-rw-rw-r--+ 1 qiita qiita 3.5M Apr 18 11:00 1841_20180418-110312.txt
$ ls --sort=size -lah [0-9][0-9][0-9][0-9][0-9]_[0-9]* | head -n 2
-rw-rw-r--+ 1 qiita qiita  132M Sep 20 13:32 10317_20180920-133258.txt
-rw-rw-r--+ 1 qiita qiita  132M Sep 20 11:09 10317_20180920-110930.txt

which shows that AGP is the study with the largest but in all fairness this has been growing little by little so decided to check for the average size:

$ ls -l [0-9][0-9][0-9][0-9][0-9]_[0-9]* | gawk '{sum += $5; n++;} END {print sum/n;}'
1.85696e+06
$ ls -l [0-9][0-9][0-9][0-9]_[0-9]* | gawk '{sum += $5; n++;} END {print sum/n;}'
229497

Thus, I would suggest not allowing upload larger than 1M?

adswafford commented 6 years ago

Reusing the code sounds reasonable, and for prep info files the 1 M seems likely to be okay, but QIIMP files are going to be closer to 3 MB at baseline based on the test files I have and larger as they add samples if they fill it a lot of samples or extra columns so the 1 M limit will be too low for those. Maybe a 10 M limit for now or is there another issue you foresee with allowing files this big?

antgonza commented 6 years ago

My concern is upload time, as the point is to make it fast as this is a blocking operation for workers; and why we have uploads in a different page with a minimize blocking configuration.

Let's imagine that we allow uploads to up to 10M and your Internet allows for 1M uploads, this will block a worker for ~ 1.25 minutes!!:

speed

Note that the cheapest Internet available in my area gives up to 2M uploads, so estimating a 1M is a real possible scenario.

xfinity

BTW for the scp key we limit to 2K.

jdereus commented 6 years ago

Isn’t this going to be a larger issue, regardless of upload size, as adoption increases?

From: Antonio Gonzalez notifications@github.com Reply-To: biocore/qiita reply@reply.github.com Date: Sunday, September 23, 2018 at 6:39 AM To: biocore/qiita qiita@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [biocore/qiita] Add upload metadata or prep info from computer to update information dropdowns (#2433)

My concern is upload time, as the point is to make it fast as this is a blocking operation for workers; and why we have uploads in a different page with a minimize blocking configuration.

Let's imagine that we allow uploads to up to 10M and your Internet allows for 1M uploads, this will block a worker for ~ 1.25 minutes!!: [speed]https://user-images.githubusercontent.com/2014559/45928535-9bd3f800-bf02-11e8-98c4-17af03c0283d.png

Note that the cheapest Internet available in my area gives up to 2M uploads, so estimating a 1M is a real possible scenario. [xfinity]https://user-images.githubusercontent.com/2014559/45928600-5f54cc00-bf03-11e8-81a4-7daf99abed5a.png

BTW for the scp key we limit to 2K.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/biocore/qiita/issues/2433#issuecomment-423817239, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ANq27c2yDutjyGL2hXGKdDkBMLozsi-mks5ud48MgaJpZM4RBa58.

antgonza commented 6 years ago

@jdereus, not really what you mean, could you clarify? Note that this is to add a direct upload option within the sample/prep buttons so users save 1 step ...

adswafford commented 5 years ago

Flagging this for more input, but since the issue is one of improving user experience I'm still in favor of the change.

Use case:

  1. User uploads hundreds of per sample fastq files as well as their sample info and prep info files in the Upload files section
  2. User waits for all files to complete, periodically checking since they have no other way of knowing when it's done (e.g. an email) nor being able to navigate away from the site
  3. User clicks on Sample Information and associates their sample info file with their study but gets an error for violating our requirements
  4. User switches to another program, edits the files, and comes back
  5. User clicks on Upload files and is confronted with a wall of filenames
  6. User searches for the name of the invalid sample info file, checks the box, navigates to the bottom of the page and clicks delete.
  7. User uploads the new file
  8. User clicks on Sample Information and successfully associates their file with their study
  9. User clicks to Add New Preparation, and tries to associate their file but gets an error for violating our requirements
  10. User switches to another program, edits the files, and comes back
  11. User clicks on Upload files and is confronted with a wall of filenames
  12. User searches for the name of the invalid prep info file, checks the box, navigates to the bottom of the page and clicks delete.
  13. User uploads the new file
  14. User clicks to Add New Preparation, and successfully associates their file with their study

Preferred usage:

  1. User clicks on Sample Information and clicks 'Choose file' dropdown and navigates to the file on their computer which attempts to be associated with their study but they get an error for violating our requirements
  2. User switches to another program, edits the files, and comes back without navigating away
  3. User clicks 'Choose file' dropdown and navigates to the file on their computer to associate their new sample info file with their study
  4. User clicks to Add New Preparation, and clicks 'Choose file' dropdown and navigates to the file on their computer which attempts to be associated with their study but they get an error for violating our requirements
  5. User switches to another program, edits the files, and comes back without navigating away
  6. User clicks 'Choose file' dropdown and navigates to the file on their computer to associate it with their study

A lot less searching, clicking, and navigation all of which are tedious and can be disorienting, inefficient, and seemingly complicated to new users.

antgonza commented 5 years ago

@adswafford and I agreed that for the time being a 2M cap will work.