hexylena / galaxyxml

XML Generation libraries for Galaxy Tool/ToolDeps XML
Apache License 2.0
6 stars 5 forks source link

Proper CLI for repeats #32

Open bernt-matthias opened 3 years ago

bernt-matthias commented 3 years ago

The content of the for loop https://github.com/hexylena/galaxyxml/blob/988c82b67881877b71abe34d6541cba718b4eb6d/examples/tool.xml#L23 need to use $i. Also the loop variable should be something unique (e.g. _REPEAT_NAME_i)

fubar2 commented 3 years ago

There is also an interesting challenge to generate useful and valid tests for repeat parameters. Arguably, anything that can be generated should be testable :)

fubar2 commented 3 years ago

33 adds TestRepeats but only tested for repeats around TestParams - need to extend the tests to TestOutputs. Repeated TestOutputCollections is just too strange but probably also needs testing.

It also adds TestOutputCollections so this:

coll_out = gxtp.TestOutputCollection(name="pdf_out")
test_a.append(coll_out)
rep_out = gxtp.TestRepeat(name="testrepeat")
param = gxtp.TestParam("repeatchild", value="foo")
rep_out.append(param)
test_a.append(rep_out)

seems to work in the test and the repeats branch of the ToolFactory supports them.

fubar2 commented 3 years ago

@bernt-matthias: Also the loop variable should be something unique (e.g. _REPEAT_NAME_i)

Assuming all repeats have unique names (not checked anywhere yet!) and since the elements are all accessed as ${repeatname.elementname} inside loops, there should not be any ambiguity even if there is a foo element inside a repeat called bar and a foo inside a repeat called zot?

The ToolFactory generates repeats and testrepeats using an optional repeat boolean flag appearing on the form for eligible parameters (inputs and data only at this point - complex outputs can go into collections!) so each TF repeat has at most one element. If an input parameter foo has the repeat flag set, the repeat internal name is R_foo - since input parameters all have unique names, this should never be ambiguous - you probably shouldn't be generating a tool with two different parameters called foo ?

This simple approach is feasible using Galaxy forms but generates only repeats containing a single parameter :( The form that allows repeats to have more than one element would be very ambitious and complex - each potential element would optionally need to be tied to the same repeat name somehow.

Autogenerating multi-element repeats will be more complicated and may require much more checking inside galaxyxml. The galaxyxml components can correctly parse and generate them AFAIK but at this point, if you want a tool with multielement repeats, the TF is not likely to ever be clever enough to suffice...which is ok with me - it is only meant for relatively simple scripts and the code seems to cope well.

hexylena commented 3 years ago

repeats have unique names

yeah that won't be true. I have some that are distinguished only by (grand-)parent elements, which I do to make the macros easier (but that's a case that probably won't ever be supported by this library, those tools are too big.)

fubar2 commented 3 years ago

My bias is a narrow focus in terms of what's possible using the Galaxy UI. Actually pretty darn good for my purposes as it is. Thank you all for an excellent library!! I'm guessing that parsing those complex tools might work - if that is useful. Adding automation for generating them will take something like the Galaxy language server and a skilled operator rather than a form driven code generator.

The argparse route you have is very cool too. Interesting because the programmer who prepares the argparse code provides the information needed to drive a code generator. A far more information efficient process than a far more complex and ugly ToolFactory form. :)