nf-core / taxprofiler

Highly parallelised multi-taxonomic profiling of shotgun short- and long-read metagenomic data
https://nf-co.re/taxprofiler
MIT License
128 stars 36 forks source link

KrakenUniq SE overwrites the output when there is an early dot in the sample name #533

Open SannaAb opened 2 months ago

SannaAb commented 2 months ago

Description of the bug

Hi,

I noticed that krakenuniq overwrites the outputs for SE samples if you have a dot in the sample name. This is not an issue for PE samples. I will attach the command.sh for both SE and PE.

Thank you so much for the help :)

Command used and terminal output

No response

Relevant files

commands.zip

System information

version 1.1.8

jfy133 commented 2 months ago

Eek! Good eyes @SannaAb ! Will try to address ASAP

jfy133 commented 2 months ago

@SannaAb can you supply an aexample samplesheet where this happens?

jfy133 commented 2 months ago

OK I think I've replciated it by duplicating the 2612 SE in test.config and updating all the sample names int eh samplesheet to add . everywhere, iself note:

SannaAb commented 2 months ago

SampleSheeet_Taxprofiler.csv

So fast! :D Thanks, if you still need an example sheet this should work.

jfy133 commented 2 months ago

Basically this:

        strip_suffix() {
            local result=\$1
            # Strip any file extensions.
            echo "${result%%.*}"
        }

In the 'single end' condition of the PRELOADED_KRAKENUNIQ module is too aggressive beause %% is 'anything' whereas if we used just the single bash expansion operation of %.* it would only to the very last . (as used in the Paired End condition - but this is more complex).

However, this more aggresive %% bash expansion is problematic when we have .fastq.gz or .fastq files.

So need to find a way to account for both contexts.

jfy133 commented 2 months ago

I've ran out of time and I'm away from this afternoon until Monday.

If @Midnighter happens to have time (as he wrote this module), maybe he can fix it in the meantime.

jfy133 commented 1 month ago
            if [[ \$result =~ ".gz^"]]; then
                resulttmp=\${result%.gz}
                result="\${resulttmp%.*}"
            else;
                echo "\${result%.*}"
            fi

OI was trying this but it didn't work