qiita-spots / qiita

Qiita - A multi-omics databasing effort
http://qiita.microbio.me
BSD 3-Clause "New" or "Revised" License
120 stars 80 forks source link

How to select multiple artifacts for analytical commands? [Discussion] #3087

Closed antgonza closed 2 years ago

antgonza commented 3 years ago

Background

Currently, the qp-qiime2 plugin (for analyses) only adds available commands that have 1 input artifact via the GUI - to be clear these are inputs not parameters; for example the steps for a Unweighted UniFrac PCoA are:

This configuration allows us to have "simple" input -> process -> output display

Issue to address

Let's use this example and imagine how can we add an Empress command to my analysis page.

Empress

Note that Empress needs: a feature table, a tree and a PCoA.

Thoughts?

justinshaffer commented 3 years ago

Hi Antonio,

Thanks!

I agree tools that require more than a single input should branch from the highest level; so in your example here, the PCoA. This will avoid having to decide if the option to process is available all the time vs. only when both artifacts exist in the analysis.

In general, I suggest moving away from the network-style visualization, in part because (1) it is restrictive in exactly this sense (i.e., where tools require more than a single input), and (2) it is not ideal that the constant refreshing re-organizes the order of the network topology. I agree that going with the 'easy' to implement drop-down menu selection with artifact IDs or similar, and wonder if using this approach in general vs. the network one (e.g., how Galaxy or CIPRES works in this sense), would be preferred by users.

Justin

On Wed, Mar 31, 2021 at 9:55 AM Antonio Gonzalez @.***> wrote:

Background

Currently, the qp-qiime2 plugin (for analyses) only adds available commands that have 1 input artifact via the GUI - to be clear these are inputs not parameters; for example the steps for a Unweighted UniFrac PCoA are:

  • rarefaction: only needs the feature table
  • Unweighted UniFrac: this actually requires two but is "easy" to retrieve one of them internally, the feature table and the tree (which is selected from a drop down menu)
  • PCoA: distance matrix
  • Emperor: the PCoA

This configuration allows us to have "simple" input -> process -> output display Issue to address

Let's use this example and imagine how can we add an Empress https://github.com/qiita-spots/qp-qiime2/issues/58 command to my analysis page. [image: Empress] https://user-images.githubusercontent.com/2014559/113179579-015c7a00-920d-11eb-8f23-aa5d83e38352.png Note that Empress needs: a feature table, a tree and a PCoA.

  • Where should we add this command to the GUI? Should we add it to the FeatureTable, to the PCoA, or both processing options? Should it be done all the time or only when both artifacts exist in the analysis? As we need both, perhaps on the "highest" level, I mean only in the PCoA
  • How should a user select the other artifacts? I would like to go from "easy" to implement, like list all the valid artifact ids-names and allow the user to select from that drop-down menu, to "hardest" to implement, like allow the user to click on the artifact they want to select.

Thoughts?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/qiita-spots/qiita/issues/3087, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSDCGEQY3HDINTQ5F3UMBLTGNHYVANCNFSM42FCO2VQ .

-- Justin Shaffer, PhD IRACDA Postdoctoral Fellow Rob Knight Group Department of Pediatrics, School of Medicine University of California, San Diego justinshafferbio.wordpress.com

antgonza commented 3 years ago

@justinshaffer; do you have any suggestions (screenshots or pictures will be great) of other options to move away from the network visualization?

justinshaffer commented 3 years ago

Yes of course - I'll use CIPRES hosted here at UCSD and that Wayne and Mark from the WoL meetings are involved in I believe. Galaxy is similar but I haven't used it in a while.

The data are organized similarly as Qiita, where artifacts and processes are referred to as data and tasks - there is a left-hand panel that visualizes the requirement that you must organize data and tasks into project folders: [image: Screen Shot 2021-03-31 at 11.12.54 AM.png] When clicking on a folder you see this screen: [image: Screen Shot 2021-03-31 at 11.14.35 AM.png]

This is what the data folder looks like: [image: Screen Shot 2021-03-31 at 11.08.11 AM.png] This is what the task folder looks like: [image: Screen Shot 2021-03-31 at 11.08.38 AM.png] Here is an example of viewing output from a task: [image: Screen Shot 2021-03-31 at 11.08.58 AM.png]

This is the menu for creating new tasks (i.e., processing or analyses), linked from the tasks folder above - I'll go through each of the tabs further below: [image: Screen Shot 2021-03-31 at 11.09.22 AM.png] Here is the 'Select Data' tab - you must select data and a tool to set parameters: [image: Screen Shot 2021-03-31 at 11.24.55 AM.png] Here is the 'Select Tool' tab: [image: Screen Shot 2021-03-31 at 11.09.43 AM.png] I'll align sequences in a fasta file using ClustalW as an example - here are the parameters: [image: Screen Shot 2021-03-31 at 11.10.27 AM.png] There are also advanced parameters for each tool that you can see if you click the link: [image: Screen Shot 2021-03-31 at 11.26.06 AM.png]

Here is the task summary, where you submit the job: [image: Screen Shot 2021-03-31 at 11.10.43 AM.png] Running tasks are shown in the tasks folder with all completed tasks/jobs.

One thing that is clear from this set-up vs. the network one, is that I think with this set-up users need to know more about the tools being used, as they are forced to choose from all tools and all available inputs for each task/process/analysis. One way around this would be to combine multiple processes / analyses from Qiita into single tasks/jobs.

Justin

On Wed, Mar 31, 2021 at 11:03 AM Antonio Gonzalez @.***> wrote:

@justinshaffer https://github.com/justinshaffer; do you have any suggestions (screenshots or pictures will be great) of other options to move away from the network visualization?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/qiita-spots/qiita/issues/3087#issuecomment-811295586, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSDCGC7ONSJHEHKAPZMLHTTGNPX3ANCNFSM42FCO2VQ .

-- Justin Shaffer, PhD IRACDA Postdoctoral Fellow Rob Knight Group Department of Pediatrics, School of Medicine University of California, San Diego justinshafferbio.wordpress.com

antgonza commented 3 years ago

@justinshaffer; 😢 no images were added to your message; I guess you need to upload them directly to github ... sorry.

justinshaffer commented 3 years ago

Whoops - let's try again - sorry about that:

Yes of course - I'll use CIPRES hosted here at UCSD and that Wayne and Mark from the WoL meetings are involved in I believe. Galaxy is similar but I haven't used it in a while.

The data are organized similarly as Qiita, where artifacts and processes are referred to as data and tasks - there is a left-hand panel that visualizes the requirement that you must organize data and tasks into project folders.

When clicking on a folder you see this screen:

image01

This is what the data folder looks like:

image02

This is what the task folder looks like:

image03

Here is an example of viewing output from a task:

image04

This is the menu for creating new tasks (i.e., processing or analyses), linked from the tasks folder above - I'll go through each of the tabs further below:

image05

Here is the 'Select Data' tab - you must select data and a tool to set parameters:

image06

Here is the 'Select Tool' tab:

image07

I'll align sequences in a fasta file using ClustalW as an example - here are the parameters:

image08

There are also advanced parameters for each tool that you can see if you click the link:

image09

Here is the task summary, where you submit the job:

image10

Running tasks are shown in the tasks folder with all completed tasks/jobs.

One thing that is clear from this set-up vs. the network one, is that I think with this set-up users need to know more about the tools being used, as they are forced to choose from all tools and all available inputs for each task/process/analysis. One way around this would be to combine multiple processes / analyses from Qiita into single tasks/jobs.

Justin

antgonza commented 3 years ago

@justinshaffer, thank you.

If I understand correctly, the main difference is unstructured/independent commands vs pipelines: you can run any single command with any input but you can't link multiple steps; is that correct? Also, in the example you also show only one input artifact but I guess the idea would be the same for multiple inputs - a button where you select the data you want to use; right?

justinshaffer commented 3 years ago

Yes, you are correct - this is one big difference as well - there is no way to fully prepare a new job until the needed input is generated from a previous step.

Also yes, for jobs that need more than one input, you have to select multiple input files from that 'Select Data' screen.

Justin

On Wed, Mar 31, 2021 at 12:10 PM Antonio Gonzalez @.***> wrote:

@justinshaffer https://github.com/justinshaffer, thank you.

If I understand correctly, the main difference is unstructured/independent commands vs pipelines: you can run any single command with any input but you can't link multiple steps; is that correct? Also, in the example you also show only one input artifact but I guess the idea would be the same for multiple inputs - a button where you select the data you want to use; right?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/qiita-spots/qiita/issues/3087#issuecomment-811357853, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSDCGFRYHU4NJUDAUNYZRLTGNXQZANCNFSM42FCO2VQ .

-- Justin Shaffer, PhD IRACDA Postdoctoral Fellow Rob Knight Group Department of Pediatrics, School of Medicine University of California, San Diego justinshafferbio.wordpress.com