nf-core / taxprofiler

Highly parallelised multi-taxonomic profiling of shotgun short- and long-read metagenomic data
https://nf-co.re/taxprofiler
MIT License
127 stars 35 forks source link

Current UNTAR scheme inefficent and can cause overwriting for database sheet input #462

Closed jfy133 closed 6 months ago

jfy133 commented 6 months ago

Description of the bug

Currently we 'blindly' untar every row in the database samplesheet. If someone has submitted the same database but with different parameters, we will untar that database repeatedly. If we go to publish this, we we will repeatedly overwrite the same files - the latter causes an error with strict

We should improve the efficiency of untarring to only do it once per file - so likely we have to group by file name, untar, then spread again with the metas to separate the differnet parameters.

Picked up by setting nextflow.enabled.strict = true in test config and running the test_malt profile

Command used and terminal output

No response

Relevant files

No response

System information

No response

jfy133 commented 6 months ago

https://github.com/nf-core/taxprofiler/pull/461 done