nfdi4plants / ARCCommander

Tool to manage your ARCs
MIT License
11 stars 9 forks source link

[BUG] assay outputs format of ARC.json file incorrect #154

Closed Hannah-Doerpholz closed 1 year ago

Hannah-Doerpholz commented 1 year ago

Describe the bug In the json output file from Dominik's rnaseq sample ARC the format of the output (in the assays section, last assay, outputs) is not compliant with the ISA-JSON specification. There are additional list entries after each output description containing the name of the output file again as well as the output type (Raw Data File in this case).

To Reproduce Steps to reproduce the behavior:

  1. Clone the public sample ARC: https://git.nfdi4plants.org/brilator/samplearc_rnaseq
  2. execute "arc export -o
  3. go to line 2604 in the json-formatted output file to get to the inputs section

Expected behavior I only expected the "first part" of the "two part output", the part containing "name", "factor values" and "derives from", not an additional list object with "name" and "type", as this is not mentioned in the ISA-JSON specification in the output section but rather in another section "dataFiles" which is missing in the JSON output. Thus, I got 12 outputs (with 2 objects belonging together by name) instead of the expected 6 for the 6 lines from the .xlsx file.

Screenshots Bildschirmfoto von 2022-10-17 11-58-55

OS and framework information (please complete the following information):

Additional context I don't recall from the ISA-JSON specification that the output type is specified anywhere. Is this just specific to the ARCcommander or is it unintended? In the example files of isatools there is an additional section in assays that describes the data output (name + type) which can be found here: https://github.com/ISA-tools/ISAdatasets/blob/tests/json/ISA-1/isa-test1.json (line 43). This section is at least noted in the assay schema provided by isatools (https://github.com/ISA-tools/isa-api/tree/master/isatools/resources/schemas/isa_model_version_1_0_schemas/core). Which specification of ISA exactly is the base for the ARCcommander?

HLWeil commented 1 year ago

Hey, this is actually not a bug but expected behaviour. It's a workaround for a problem arising when trying to convert ISA-Tab to ISA-JSON:

ISA-Tab supports the Data File Name column, and this also exists in the json: https://isa-specs.readthedocs.io/en/latest/isajson.html#data-schema-json. This is where the field type comes frome. Unfortunately, this object has no factors field, which is essential for describing the experiment. Therefore when parsing a line with a data file output, an additional sample object with the same name get's created, which is used to store the factors.

As the mapping from input to output in the process is handled exactly n to n, this also requires a duplication of the input objects, which is what you see in https://github.com/nfdi4plants/arcCommander/issues/153.

I stated above that this is expected behaviour, but of course because of the lacking documentation only for a few people. I will check where I can add information like this.

Hannah-Doerpholz commented 1 year ago

I understand, sorry for the confusion, I wasn't aware that there was such a conversion issue. Thank you for the clarification!