nf-core / rnaseq

RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.
https://nf-co.re/rnaseq
MIT License
904 stars 705 forks source link

RNAseq pipeline changing my sample names #1077

Closed AnnaDvdH closed 1 year ago

AnnaDvdH commented 1 year ago

Description of the bug

Hello,

I've been having some issues when running the rnaseq pipeline in Uppmax. It seems like the pipeline is automatically switching the names of my samples, which makes it really difficult to trace which sample is which.

I am working with tumour data, so my samples contain a T from tumour, followed by a number, for example 07TDOXTAK_NC_T3. However, in my results the samples are all being switched to something like 07TDOXTAK_NC_T1. Since I have several tumours from the same individual, I really need to keep the number after the T.

I got some help already through the slack channel, and I got recommended to switch my sample name to something else. This is because the pipeline uses _T<DIGIT> to name technical replicates and because I am adding it as my sample name, it is breaking the assumptions in the pipeline. I did this, so now my samples don't end with a T1 but with a NC (something like 07TDOXTAK_T3_NC), however, I am getting the error again and all my samples are being switched to 07TDOXTAK_NC_T1 with the _T1 at the end, and not NC. I am a bit lost and not sure what to test next.

Thank you in advance for all the help!

Command used and terminal output

# Code used to run the pipeline as a .sh file followed by the name of my inputfile.csv

module purge
module load uppmax bioinfo-tools
module load Nextflow/22.10.1
module load nf-core-pipelines/latest

# Don't let Java get carried away and use huge amounts of memory
export NXF_OPTS='-Xms1g -Xmx4g'

# Don't fill up your home directory with cache files
export NXF_HOME=/absolutepath/5_RNAseq/arm5_6_rnaseq/
export NXF_TEMP=$SNIC_TMP
export NXF_SINGULARITY_CACHEDIR=/absolutepath/5_RNAseq/cache_rnaseq

# Run RNAseq pipeline

nextflow run $NF_CORE_PIPELINES/rnaseq/3.12.0/workflow \
    --project snic2022-5-620 -profile uppmax \
    --email anna.vd.heiden@imbim.uu.se \
    --fasta genome/cf4.b6.14.fa \
    --gff GCF_011100685.1_UU_Cfam_GSD_1.0_genomic.NameB614.gff \
    --skip_biotype_qc \
    --input $1 \
    --outdir results

Relevant files

No response

System information

No response

pinin4fjords commented 1 year ago

@AnnaDvdH having done some testing I believe this issue is now resolved in dev. If I change the samples in the test profile to have _T2, _T3 etc as suffixes, they retain those into the results in the output directory (see multiqc report attached).

@drpatelh may be able to explain the detail of the fix, but I believe it's a consequence of https://github.com/nf-core/rnaseq/pull/1058.

Closing the issue for now. If you feel I've misunderstood, or determine that the fix does not apply to what you're doing, feel free to reopen.

multiqc_report (4).html.zip