Open bernt-matthias opened 1 year ago
To accomplish this, we would need some kind of mapping of our formats to Galaxy formats. And since both of these frameworks allow extension, we'll probably always need "data"
as an escape hatch. That said, there may be room to use EDAM to figure out mappings where they exist, I had imagined something along those lines a long time ago, but we've never really gotten around to it.
From there it ought to be possible observe that a Galaxy collection which contains entirely a certain type would be compatible with our file collection and thus constrain the collections available to import. (Do Galaxy collections have an observable format?)
Seems that the EDAM annotation is present for the Galaxy data types. Is there a list of qiime datatypes somewhere, maybe with EDAM annotations?
In general collections can contain datasets of different types. On the tool side one can use the format
attribute of param
also for data_collection
inputs. But I'm not sure if this checks all or only the first element. We could check and work on solutions to change this if necessary .. maybe also an additional validator can be used. Or one simply documents that users are required to use only uniform collections.
I find the discussion on automatically generated tool wrappers quite enlightening, because it often sheds light on shortcomings of Galaxy (or its tool framework).
As a further comment on collections: they are a nice way to generate parallelism.
ping @ebolyen .. seems that we had the same ideas already earlier :)
as just discussed: I will try to produce a figure (or a hierarchical datatype like yaml/json) of Galaxy's datatype hierarchy annotated with edam_format and edam_data entries .. then we can think of a mapping .. maybe with some help of @matuskalas
Create a little script over here. There are a few datatypes deriving from more than one class. So I used only the first in the MRO.
Result can be found here.
If needed you can probably run it using
export PYTHONPATH=$(pwd)/lib/
python hierarchy.py
probably some additional python modules are needed.
I just started to explore the qiime2 Galaxy tools. Obviously starting with the import tool I noticed that often the unspecific
format="data"
is used, e.g.https://github.com/qiime2/galaxy-tools/blob/4456c16e2ebebbf1c18b23be1f2b794be560b7d5/tools/suite_qiime2_core__tools/qiime2_core__tools__import.xml#L604
this should be avoided, in particular if there are corresponding datatypes in Galaxy. In this specific example
format="fastq.gz"
seems appropriate. But there are alsofastqsanger.gz
orfastqillumina.gz
if a specific phred encoding is required.