Closed bjoernpollex closed 8 years ago
Hi @bjoernpollex,
You can get the input schema but not the File/Values for inputs for the app that interests you.
For example if you have an app
object you can invoke something like this.
In [24]: app.name
Out[24]: u'SRA Toolkit sam-dump'
In [25]: app.raw['inputs']
Out[25]:
[{u'description': u'SRA file to be extracted.',
u'id': u'#sra_file',
u'inputBinding': {u'position': 1,
u'sbg:cmdInclude': True,
u'separate': False},
u'label': u'SRA input file',
u'sbg:category': u'Input Files',
u'sbg:fileTypes': u'SRA',
u'type': [u'null', u'File']},
{u'description': u'This input should be used if SRA file is not pre-downloaded or not in project files directory - and only if subset of SRA is to be dumped. Example: SRR000111',
u'id': u'#sra_number',
u'inputBinding': {u'position': 1,
u'sbg:cmdInclude': True,
u'separate': False},
u'label': u'SRA accession number',
u'sbg:category': u'Input Files',
u'sbg:toolDefaultValue': u'No value',
u'type': [u'null', u'string']},
{u'description': u'Output only primary alignments.',
u'id': u'#primary',
u'inputBinding': {u'position': 0,
u'prefix': u'--primary',
u'sbg:cmdInclude': True,
u'separate': False},
u'label': u'Output primary alignments',
u'sbg:category': u'Data formatting',
u'sbg:stageInput': None,
u'sbg:toolDefaultValue': u'FALSE',
u'type': [u'null', u'boolean']},
{u'description': u'Output long version of CIGAR string.',
u'id': u'#cigar_long',
u'inputBinding': {u'position': 0,
u'prefix': u'--cigar-long',
u'sbg:cmdInclude': True,
u'separate': False},
u'label': u'Output long CIGAR',
u'sbg:category': u'Data formatting',
u'sbg:stageInput': None,
u'sbg:toolDefaultValue': u'FALSE',
u'type': [u'null', u'boolean']},
{u'description': u'Always reconstruct header.',
u'id': u'#header',
u'inputBinding': {u'position': 0,
u'prefix': u'--header',
u'sbg:cmdInclude': True,
u'separate': False},
u'label': u'Reconstruct header',
u'sbg:category': u'Data formatting',
u'sbg:stageInput': None,
u'sbg:toolDefaultValue': u'FALSE',
u'type': [u'null', u'boolean']},
{u'description': u'Do not output headers.',
u'id': u'#no_header',
u'inputBinding': {u'position': 0,
u'prefix': u'--no-header',
u'sbg:cmdInclude': True,
u'separate': False},
u'label': u'No header in output',
u'sbg:category': u'Data formatting',
u'sbg:toolDefaultValue': u'FALSE',
u'type': [u'null', u'boolean']},
{u'description': u'Print reference SEQ_ID in RNAME instead of NAME.',
u'id': u'#seq_id',
u'inputBinding': {u'position': 0,
u'prefix': u'--seqid',
u'sbg:cmdInclude': True,
u'separate': False},
u'label': u'SeqID in RNAME',
u'sbg:category': u'Data formatting',
u'sbg:stageInput': None,
u'sbg:toolDefaultValue': u'FALSE',
u'type': [u'null', u'boolean']},
{u'description': u"Output '=' if base is identical to reference.",
u'id': u'#hide_identical',
u'inputBinding': {u'position': 0,
u'prefix': u'--hide-identical',
u'sbg:cmdInclude': True,
u'separate': False},
u'label': u'Hide identical bases',
u'sbg:category': u'Data formatting',
u'sbg:stageInput': None,
u'sbg:toolDefaultValue': u'FALSE',
u'type': [u'null', u'boolean']},
{u'description': u'Reverse unaligned reads according to read type.',
u'id': u'#reverse',
u'inputBinding': {u'position': 0,
u'prefix': u'--reverse',
u'sbg:cmdInclude': True,
u'separate': False},
u'label': u'Reverse unaligned reads',
u'sbg:category': u'Data formatting',
u'sbg:stageInput': None,
u'sbg:toolDefaultValue': u'FALSE',
u'type': [u'null', u'boolean']},
{u'description': u'Modify cigar-string and output flags if rna-splicing detected.',
u'id': u'#rna_splicing',
u'inputBinding': {u'position': 0,
u'prefix': u'--rna-splicing',
u'sbg:cmdInclude': True,
u'separate': False},
u'label': u'Modify CIGAR if rna-splicing detected',
u'sbg:category': u'Data formatting',
u'sbg:stageInput': None,
u'sbg:toolDefaultValue': u'FALSE',
u'type': [u'null', u'boolean']},
{u'description': u'Output unaligned reads along with aligned reads.',
u'id': u'#unaligned',
u'inputBinding': {u'position': 0,
u'prefix': u'--unaligned',
u'sbg:cmdInclude': True,
u'separate': False},
u'label': u'Output unaligned reads',
u'sbg:category': u'Filtering',
u'sbg:stageInput': None,
u'sbg:toolDefaultValue': u'FALSE',
u'type': [u'null', u'boolean']},
{u'description': u'Pipes sam-dump to samtools to output BAM file. Samtools is version 1.3',
u'id': u'#output_bam',
u'label': u'Output BAM',
u'sbg:category': u'Config',
u'sbg:toolDefaultValue': u'FALSE',
u'type': [u'null', u'boolean']},
{u'description': u'Name can either be file specific name (ex: "chr1" or "1"). "from" and "to" (inclusive) are 1-based coordinates (for example: 1:6484848-6521430)',
u'id': u'#aligned_region',
u'inputBinding': {u'position': 0,
u'prefix': u'--aligned-region',
u'sbg:cmdInclude': True,
u'separate': True},
u'label': u'Filter by position on genome',
u'sbg:category': u'Filtering',
u'sbg:toolDefaultValue': u'No value',
u'type': [u'null', u'string']}]
Here you can see the inputs definition for that app and their schema if its a boolean, file or a file array etc. But in general app describes the inputs, you can think of it like all the meta information about the inputs. The input values themselves are properties of the Task. Therefore you have to get tasks. Regarding comparison I've answered to that in #6
Ok, I understand that. The problem I want to solve is this: Given an app, and a dictionary of inputs, how can I find out if I have run this task previously? So maybe the API could offer a special function for that?
Well the app will give you the schema only so no use of that only the app id is required the inputs you have to form yourself. But I could imagine something like this could be done.
tasks = list(api.tasks.query(limit=100).all())
inputs = {'SomeInput': True, 'SomeFileInput': sb.File(id='FileUUID')}
app = api.apps.get(id='AppId')
tasks_of_interest = [task for task in tasks if (task.app == app.id and task._data['inputs'] == inputs)]
This the the way if you know the internals of the library that object representation is saved within the data variable of the resource. The API nor the lib have a general way of doing this. This will be also left open so that I could think about.
EDIT: Here's a better version of this function. Everything in one line, get only required values (or all if desired). No need for iteritems so I believe should work for both Python 2 and 3.
import pprint
pp = pprint.PrettyPrinter(indent=4)
def generate_input_object(app_object, required=True, print_opt=False):
"""
Generates a input object template for submitting tasks
app_object
if required = True, get only the input ports without "null" as an acceptable "type"
if print_opt = True, pretty print to console (for easier copy-pasting)
"""
if required:
input_object = dict((str(k['id'].split('#')[-1]), "") for k in app_object.raw['inputs'] if "null" not in k['type'])
else:
input_object = dict((str(k['id'].split('#')[-1]), "") for k in app_object.raw['inputs'])
if print_opt:
pp.pprint(input_object)
return input_object
myapp_inputs = generate_input_object(myapp, required=True, print_opt=True)
@gaurav-kaushik Thanks for the snippet. Yes this works fine. We wont expose any CWL specific stuff yet.
@gaurav-kaushik Thanks, this is awesome!
Given an app, it would be very useful to get a dictionary with the default inputs for that app. The use case for this is when trying to check if an app has previously been run with a given set of inputs. I can get all tasks for a given app, and then compare the inputs. However, since the inputs on the task objects contain all the default values, I cannot simply compare the dictionaries (also, see #6 for problems with comparing file inputs).