phnmnl / galaxy

Data intensive science for everyone.
https://galaxyproject.org/
Other
3 stars 3 forks source link

Failed to upload MTBLS520 #37

Closed pkrog closed 6 years ago

pkrog commented 6 years ago

Using branch release_18.01_plus_isa_runnerRestartJobs, I've tried to upload study MTBLS520.zip (metadata only). The ISA dataset creation failed, displaying the message Unable to finish job. The error in the Galaxy log was:

galaxy.model.metadata DEBUG 2018-07-12 10:03:30,674 [p:1897,w:1,m:0] [LocalRunner.work_thread-0] setting metadata externally failed for HistoryDatasetAssociation 1: 'NoneType' object has no attribute 'datatype'
galaxy.jobs.runners.local ERROR 2018-07-12 10:03:30,679 [p:1897,w:1,m:0] [LocalRunner.work_thread-0] Job wrapper finish method failed
Traceback (most recent call last):
  File "lib/galaxy/jobs/runners/local.py", line 152, in queue_job
    self._finish_or_resubmit_job(job_state, stdout, stderr, exit_code)
  File "lib/galaxy/jobs/runners/__init__.py", line 432, in _finish_or_resubmit_job
    job_state.job_wrapper.finish(stdout, stderr, exit_code, check_output_detected_state=check_output_detected_state)
  File "lib/galaxy/jobs/__init__.py", line 1288, in finish
    dataset.datatype.set_meta(dataset, overwrite=False)
  File "lib/galaxy/datatypes/isa.py", line 419, in set_meta
    self._set_dataset_name(dataset)
  File "lib/galaxy/datatypes/isa.py", line 426, in _set_dataset_name
    investigation = self._get_investigation(dataset)
  File "lib/galaxy/datatypes/isa.py", line 160, in _get_investigation
    investigation = self._make_investigation_instance(main_file)
  File "lib/galaxy/datatypes/isa.py", line 519, in _make_investigation_instance
    parser.parse(fp)
  File "/Users/pierrick/dev/galaxy/.venv/lib/python2.7/site-packages/isatools/isatab_meta.py", line 88, in parse
    self._parse(filepath_or_buffer)
  File "/Users/pierrick/dev/galaxy/.venv/lib/python2.7/site-packages/isatools/isatab_meta.py", line 783, in _parse
    self._parse_publications_section(section_label, section)
  File "/Users/pierrick/dev/galaxy/.venv/lib/python2.7/site-packages/isatools/isatab_meta.py", line 676, in _parse_publications_section
    status_accession, status_term_source, fillvalue=''):
TypeError: izip_longest argument #5 must support iteration
galaxy.tools.error_reports DEBUG 2018-07-12 10:03:31,413 [p:1897,w:1,m:0] [LocalRunner.work_thread-0] Bug report plugin <galaxy.tools.error_reports.plugins.sentry.SentryPlugin object at 0x124cb51d0> generated response None
galaxy.model.metadata DEBUG 2018-07-12 10:03:31,421 [p:1897,w:1,m:0] [LocalRunner.work_thread-0] Cleaning up external metadata files
galaxy.model.metadata DEBUG 2018-07-12 10:03:31,453 [p:1897,w:1,m:0] [LocalRunner.work_thread-0] Failed to cleanup MetadataTempFile temp files from /Users/pierrick/dev/galaxy/database/jobs_directory/000/1/metadata_out_HistoryDatasetAssociation_1__sJSY9: No JSON object could be decoded
pkrog commented 6 years ago

The error was first detected by @korseby (see https://phenomenal-h2020.slack.com/archives/C0R5FKERL/p1531301810000299 on Slack).

djcomlab commented 6 years ago

Fixes found upstream in https://github.com/ISA-tools/isa-rwval/pull/10

pkrog commented 6 years ago

We just need a new package version number for isa-rwval, now, @djcomlab .

djcomlab commented 6 years ago

Yes, see last commit.

Just testing it now.

djcomlab commented 6 years ago

For some reason when loading MTBLS520, there are no factors listed, or assays and data files:

screen shot 2018-07-16 at 13 53 06
pkrog commented 6 years ago

Great ! thanks. Tell me if you want me to test that.

pkrog commented 6 years ago

Yes that is the effect I guess. Look at my patch. I've just transformed None values into empty lists. I don't know while those three variables were set to None.

pkrog commented 6 years ago

So maybe the real issue is somewhere upper in the code, but I don't know how to identify it. I suggest we let it like this for Dalcotidine release, and open an issue in rwval so maybe you can find the real problem later.

djcomlab commented 6 years ago

OK sure. The real problem is likely that MTBLS520 is malformed. @proccaserra has reported various problems in MetaboLights ISA-Tabs to them already.

But if @korseby needs MTBLS520 to work urgently, I can prioritise fixing this.

pkrog commented 6 years ago

Maybe at some point we could had a warning message in red inside the HTML saying something like "The input ISA-Tab archive was malformed, some information about the internal data cannot be displayed.".

korseby commented 6 years ago

On Metabolights there are listed several experimental factors, see screenshot:

screen shot 2018-07-16 at 16 00 51

I don't quite understand. Can you forward me the list of potential errors?

The challenge of the MTBLS520 data set is, that it has a different experiment design than traditional biomedicine studies for which Metabolights was designed for.

djcomlab commented 6 years ago

@korseby There are a range of errors in the ISA-Tabs that MetaboLights has stored in its database that sometimes cause loading problems. I'll check MTBLS520 myself through the validator and also try and see why it is missing displaying some of the metadata as per above.

djcomlab commented 6 years ago

MTBLS520 seems to load OK with the full isatools ISA-Tab parser (Galaxy datatype uses isa-rwval that has a stripped-back parser).

korseby commented 6 years ago

That sounds great. Are there any showstoppers left?

djcomlab commented 6 years ago

Yes, the Galaxy datatype can't use the full isatools ISA-Tab parser...

djcomlab commented 6 years ago

I found the problem. It was indeed cause by a data issue! In the investigation file:

STUDY PUBLICATIONS
Study PubMed ID ""
Study Publication DOI   ""
Study Publication Author List   "Kristian Peters
Karin Gorzolka
Steffen Neumann
Helge Bruelheide"
Study Publication Title "Computational workflow to study the seasonal variation of secondary metabolites in 9 different bryophytes"
Study Publication Status    ""
Study Publication Status Term Accession Number  ""
Study Publication Status Term Source REF    ""

We can see that the Study Publication Author List value has line breaks, which the parser picks up as newlines obviously, then incorrectly continues. The investigation file is parsed like a CSV table, so for the author list only "Kristian Peters would be picked up, and then the following lines go on to cause the parser to behave incorrectly.

djcomlab commented 6 years ago

This should now be fixed with changes in isa-rwval ref https://github.com/ISA-tools/isa-rwval/issues/11

djcomlab commented 6 years ago

Note: fix is in https://github.com/phnmnl/galaxy/tree/release_18.01_plus_isa_runnerRJ_clean