nf-core / differentialabundance

Differential abundance analysis for feature/ observation matrices from platforms such as RNA-seq
https://nf-co.re/differentialabundance
MIT License
46 stars 29 forks source link

Auto-format sample names #262

Open nick-youngblut opened 2 months ago

nick-youngblut commented 2 months ago

Description of feature

The nf-core/rnaseq pipeline will generate a salmon.merged.gene_lengths.tsv file in which the sample names have been modified if the sample names contain a dash (- => .).

It would be helpful if the differentialabundance pipeline automatically did the same in order to prevent the DIFFERENTIALABUNDANCE:VALIDATOR process from failing.

pinin4fjords commented 2 months ago

Could you expand on this a little please? Is differentialabundance correcting the sample names in some files and not others, causing the validation to fail?

nick-youngblut commented 2 months ago

Is differentialabundance correcting the sample names in some files and not others, causing the validation to fail?

Yes, at least the same names in salmon.merged.gene_lengths.tsv are modified so that dashes are converted to dots ('-' => '.'). It would be helpful if the sample naming was standardized across all output (e.g., add a sample renaming process to the start of the pipeline).