nf-core / rnaseq

RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.
https://nf-co.re/rnaseq
MIT License
923 stars 708 forks source link

Error in findColumnWithAllEntries(ids, metadata) : No column contains all vector entries #1445

Open RaverJay opened 3 weeks ago

RaverJay commented 3 weeks ago

Description of the bug

Hey, on the current version 3.17.0, I get:

Error in findColumnWithAllEntries(ids, metadata) : No column contains all vector entries

from:

Error executing process > 'NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SE_GENE_SCALED (all_samples)'

Command executed [/home/sebastian/.nextflow/assets/nf-core/rnaseq/./workflows/rnaseq/../../subworkflows/nf-core/quantify_pseudo_alignment/../../../modules/nf-core/summarizedexperiment/summarizedexperiment/templates/summarizedexperiment.r]:

This summarizedexperiment.r script seems to look for the geneids in the samplesheet?! Not sure what goes wrong here.

Version 3.14.0 ran this step successfully just now, and also ran successfully on this fasta/gff before with other read data.

Best

Command used

nextflow run \
    nf-core/rnaseq \
    -profile singularity \
    -resume \
    --input samplesheet.csv \
    --outdir results_rnaseq \
    --gff genome/VCh_C6706_renamed_txfix_agat.gff \
    --fasta genome/VCh_C6706_renamed.fa \
    --igenomes_ignore \
    --genome null \
    --skip_biotype_qc \
    --skip_rseqc

samplesheet.csv

sample,fastq_1,fastq_2,strandedness,condition,replicate
1A2,reads/1A2_S4_R1_001.fastq.gz,reads/1A2_S4_R2_001.fastq.gz,auto,1,A2
1B2,reads/1B2_S14_R1_001.fastq.gz,reads/1B2_S14_R2_001.fastq.gz,auto,1,B2
1C1,reads/1C1_S24_R1_001.fastq.gz,reads/1C1_S24_R2_001.fastq.gz,auto,1,C1
5A2,reads/5A2_S7_R1_001.fastq.gz,reads/5A2_S7_R2_001.fastq.gz,auto,5,A2
5B2,reads/5B2_S17_R1_001.fastq.gz,reads/5B2_S17_R2_001.fastq.gz,auto,5,B2
5C1,reads/5C1_S27_R1_001.fastq.gz,reads/5C1_S27_R2_001.fastq.gz,auto,5,C1

Output


Command executed [/home/sebastian/.nextflow/assets/nf-core/rnaseq/./workflows/rnaseq/../../subworkflows/nf-core/quantify_pseudo_alignment/../../../modules/nf-core/summarizedexperiment/summarizedexperiment/templates/summarizedexperiment.r]:

Command exit status:
  1

Command output:
  (empty)

Command error:
  3815             VCA1068.1             VCA1068             VCA1068
  3816             VCA1069.1             VCA1069             VCA1069
  3817             VCA1070.1             VCA1070             VCA1070
  3818             VCA1071.1             VCA1071             VCA1071
  3819             VCA1072.1             VCA1072             VCA1072
  3820             VCA1073.1             VCA1073             VCA1073
  3821             VCA1074.1             VCA1074             VCA1074
  3822             VCA1075.1             VCA1075             VCA1075
  3823             VCA1077.1             VCA1077             VCA1077
  3824                VqmR.1                VqmR                VqmR
  3825             VCA1078.1             VCA1078             VCA1078
  3826             VCA1079.1             VCA1079             VCA1079
  3827             VCA1080.1             VCA1080             VCA1080
  3828             VCA1081.1             VCA1081             VCA1081
  3829             VCA1082.1             VCA1082             VCA1082
  3830             VCA1084.1             VCA1084             VCA1084
  3831             VCA1085.1             VCA1085             VCA1085
  3832             VCA1086.1             VCA1086             VCA1086
  3833             VCA1087.1             VCA1087             VCA1087
  3834             VCA1088.1             VCA1088             VCA1088
  3835             VCA1089.1             VCA1089             VCA1089
  3836             VCA1090.1             VCA1090             VCA1090
  3837             VCA1091.1             VCA1091             VCA1091
  3838             VCA1092.1             VCA1092             VCA1092
  3839             VCA1093.1             VCA1093             VCA1093
  3840             VCA1094.1             VCA1094             VCA1094
  3841             VCA1095.1             VCA1095             VCA1095
  3842             VCA1096.1             VCA1096             VCA1096
  3843             VCA1097.1             VCA1097             VCA1097
  3844             VCA1098.1             VCA1098             VCA1098
  3845             VCA1099.1             VCA1099             VCA1099
  3846             VCA1100.1             VCA1100             VCA1100
  3847             VCA1101.1             VCA1101             VCA1101
  3848             VCA1102.1             VCA1102             VCA1102
  3849             VCA1104.1             VCA1104             VCA1104
  3850             VCA1105.1             VCA1105             VCA1105
  3851             VCA1106.1             VCA1106             VCA1106
  3852             VCA1108.1             VCA1108             VCA1108
  3853             VCA1109.1             VCA1109             VCA1109
  3854             VCA1110.1             VCA1110             VCA1110
  3855             VCA1111.1             VCA1111             VCA1111
  3856             VCA1112.1             VCA1112             VCA1112
  3857             VCA1113.1             VCA1113             VCA1113
  3858             VCA1114.1             VCA1114             VCA1114
  3859             VCA1115.1             VCA1115             VCA1115
  Error in findColumnWithAllEntries(ids, metadata) : 
    No column contains all vector entries  16Sa, 16Sb, 16Sc, 16Sd, 16Se, 16Sf, 16Sg, 16Sh, 23Sa, 23Sb, 23Sc, 23Sd, 23Se, 23Sf, 23Sg, 23Sh, 4.5s, 5Sa, 5Sb, 5Sc, 5Sd, 5Se, 5Sf, 5Sg, 5Sh, 6S, aat, aceE, aceF, acpP, acpS, adk, alaS, alr, anmK, ansA, apaG, apaH, araD, argC, argD, argS, aroB, aroE, aroK, artM, artP, asnB, asnC, aspA, aspS, astD, atpC, avtA, b12, bcp, bioD, cadB, carB, cca, ccrB, cdiGMP, clpA, clpP, clpS, clpX, cls, cmk, coaD, coaE, cobS, cobT, cobU, cpdB, cpxA, cpxP, CsrB, CsrC, CsrD, cyaA, cyaY, cysB, cysE, cysM, cysN, cysS, dapA, dapF, dcuC, ddl, def, deoA, deoD, dinG, dipZ, djlA, dnaA, dnaE, dnaG, dnaK, ectC, emrD, engA, engB, eno, envZ, era, fabG, fabZ, fadA, fadB, fadE, fadI, fadJ, fbpC, fieF, fis, FlaX, flgA, flgB, flgC, flgD, flgE, flgF, flgG, flgH, flgI, flgJ, flgK, flgL, flhA, flhB, flhF, fliA, fliD, fliE, fliF, fliG, fliH, fliI, fliJ, fliL, fliM, fliN, fliP, fliQ, fliR, fliS, fmt, folE, fre, frr, frsA, ftsA, ftsB, fumC, fur, fxsA, galM, gcp, GcvB, gidB, glgA, 
  Calls: parse_metadata -> findColumnWithAllEntries
  Execution halted
  INFO:    Cleaning up image...

Relevant files

No response

System information

Nextflow v24.10.0 64 core desktop server local executor singularity nfcore/rnaseq v3.17.0

pbieberstein2 commented 3 weeks ago

I think the issue is that the sample names start with a number. So the quick fix is to make sure your sample names do not start with numeric characters. This seems to be somewhat related to issue #1364 which was solved (that R doesn't change - into .). BUT this situation with a starting numeric character seems to have a slightly different root cause.

RaverJay commented 2 weeks ago

You are correct, output tables have a prepended 'X' in the column names. Thanks R, very cool -_-

I guess the old version works because the failing script is new?

Maybe the pipeline should issue a warning when the metadata is first read then