hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
189 stars 58 forks source link

ACTIN-322: Implement safe handling of missing linx plot directory #473

Closed kzuberihmf closed 11 months ago

kzuberihmf commented 11 months ago

This makes the linx plot directory config setting Nullable. When its null, linx plots are ignored. If not null, we check how many plots are expected and compare with plots found, and only error if a mismatch is detected.

So now if no plots are produced, it's ok for the plots dir to be missing (can still be specified in the orange config). It should be possible to cleanup manual creation of this directory in pipeline scripts.

To compute expected counts, this uses the same algorithm as linx visualizer (not in common code unfortunately), so the results should be the same.

Notes this adds a dependency on two additional files in the linx directory not previously read by orange: sample.linx.vis_fusions.tsv and sample.linx.vis_sv_data.tsv. For simplification, the entire data files are not modelled as they are in linx visualizer, instead only minimal fields required for checking are loaded.

Seemed to hold up in some manual testing. However as I've never encountered actual error modes of the pipeline process, not sure if there's a way for this checking to still fail (e.g. by the vis files not existing).