nasa / GeneLab_Data_Processing

57 stars 41 forks source link

[Microarray] Assertion fails in GENERATE_SOFTWARE_TABLE when data files are not compressed #99

Open cyouh95 opened 1 month ago

cyouh95 commented 1 month ago

Description

R.utils is only used when data files are compressed (e.g., .CEL.gz) to unzip them. The following assertion fails with uncompressed data files (e.g., .CEL) because R.utils is not used:

https://github.com/nasa/GeneLab_Data_Processing/blob/90d6bb5d6a20d817fa17ac5cb0763d4f8f75966b/Microarray/Affymetrix/Workflow_Documentation/NF_MAAffymetrix/workflow_code/modules/GENERATE_SOFTWARE_TABLE/resources/usr/bin/SoftwareYamlToMarkdownTable.py#L57

Solution

Modify AFFYMETRIX_SOFTWARE_DPPD to exclude R.utils if data files are not compressed. Same thing can be done to AGILENT_SOFTWARE_DPPD in Agilent pipeline.

cyouh95 commented 2 weeks ago

Array Data File Name field in runsheet used to determine whether data files are compressed or not. Quoted commas cause issue in splitCsv() as described here, but can be resolved by specifying quote parameter.