nf-core / taxprofiler

Highly parallelised multi-taxonomic profiling of shotgun short- and long-read metagenomic data
https://nf-co.re/taxprofiler
MIT License
128 stars 36 forks source link

error running test profile with conda #551

Open AnotherSimon opened 2 days ago

AnotherSimon commented 2 days ago

Description of the bug

While setting up taxprofiler on my local machine. I encountered an issue when using conda as the executor. There seems to be a reproducible failure for the process "NFCORE_TAXPROFILER:TAXPROFILER:STANDARDISATION_PROFILES:TAXPASTA_MERGE" stemming from an upstream error in the sample "2613_db3.centrifuge".

This error does not occur when using singularity as the executor on the same machine. Not sure if this is a Nextflow, conda or taxprofiler issue.

Command used and terminal output

nextflow run nf-core/taxprofiler -r 1.2.0 -profile test,conda --outdir ./taxprofiler_test_conda

Relevant files

nextflow.conda.log nextflow.singularity.log

System information

Nextflow version 24.10.0 build 5928 conda 24.9.2 Hardware: 21 CPU threads, 30 GB RAM Ubuntu 22.04 LTS Taxprofiler release 1.2.0

jfy133 commented 1 day ago

The corresponding error

Nov-20 11:53:58.454 [TaskFinalizer-2] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_TAXPROFILER:TAXPROFILER:STANDARDISATION_PROFILES:TAXPASTA_MERGE (centrifuge|db3)'

Caused by:
  Process `NFCORE_TAXPROFILER:TAXPROFILER:STANDARDISATION_PROFILES:TAXPASTA_MERGE (centrifuge|db3)` terminated with an error exit status (1)

Command executed:

  taxpasta merge \
      --profiler centrifuge \
      --output centrifuge_db3.tsv \
       \
       \
       \
      2613_db3.centrifuge.txt 2612_db3.centrifuge.txt

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_TAXPROFILER:TAXPROFILER:STANDARDISATION_PROFILES:TAXPASTA_MERGE":
      taxpasta: $(taxpasta --version)
  END_VERSIONS

Command exit status:
  1

Command output:
  [11:53:57] CRITICAL Error in sample '2613_db3.centrifuge' with profile '2613_db3.centrifuge.txt'.                                                                                               merge.py:419
             CRITICAL   schema_context column         check check_number failure_case  index                                                                                                      merge.py:424
                      0         Column   name  not_nullable         None          NaN      1                                                                                                                  

Command error:
  [11:53:57] CRITICAL Error in sample '2613_db3.centrifuge' with profile '2613_db3.centrifuge.txt'.                                                                                               merge.py:419
             CRITICAL   schema_context column         check check_number failure_case  index                                                                                                      merge.py:424
                      0         Column   name  not_nullable         None          NaN      1                                                                                                                  

Work dir:
  /home/simon/Documents/work/0a/975f9fc06ff33f526fa0cc153fb477

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

I also see other errors such as:

Nov-20 11:53:23.754 [TaskFinalizer-1] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=NFCORE_TAXPROFILER:TAXPROFILER:STANDARDISATION_PROFILES:KRAKENTOOLS_COMBINEKREPORTS_CENTRIFUGE (1); work-dir=/home/simon/Documents/work/e5/a5db9df68893383c4711a60b64b39d
  error [nextflow.exception.ProcessFailedException]: Process `NFCORE_TAXPROFILER:TAXPROFILER:STANDARDISATION_PROFILES:KRAKENTOOLS_COMBINEKREPORTS_CENTRIFUGE (1)` terminated with an error exit status (1)

Could you maybe go into

cd  /home/simon/Documents/work/c4/978646<...autocomplete>`

and send the contents of .command.log? I have a feeling centrifuge failed for some reason producing an empty file and wasn't picked up for some reason

jfy133 commented 1 day ago

OK actually I was able to replicate the conda error, I'm going to look into this now :)

The two cnetrifuge report files that get generated are:

$ cat *
 99.49  1801191 1801191 U       0       unclassified
  0.51  9313    9313    -       1
 99.90  434053  434053  U       0       unclassified
  0.10  440     440     -       1

So I guess this might be a taxprofiler bug - @Midnighter , this is because is no taxon name there, or?

Uhh strange, tax id 9313 is 😅

Image

jfy133 commented 1 day ago

Docker produces:

 cat *.txt
 99.49  1801191 1801191 U       0       unclassified
  0.51  9313    0       -       1       root
  0.51  9313    0       -       131567    cellular organisms
  0.51  9313    0       D       2759        Eukaryota
  0.50  8972    0       K       33090         Viridiplantae
  0.50  8972    0       P       35493           Streptophyta
  0.50  8972    0       -       131221            Streptophytina
  0.50  8972    0       -       3193                Embryophyta
  0.50  8972    0       -       58023                 Tracheophyta
  0.50  8972    0       -       78536                   Euphyllophyta
  0.50  8972    0       -       58024                     Spermatophyta
  0.50  8972    0       C       3398                        Magnoliopsida
  0.50  8972    0       -       1437183                       Mesangiospermae
  0.50  8972    0       -       71240                           eudicotyledons
  0.50  8972    0       -       91827                             Gunneridae
  0.50  8972    0       -       1437201                             Pentapetalae
  0.50  8972    0       -       71274                                 asterids
  0.50  8972    0       -       91888                                   lamiids
  0.50  8972    0       O       91889                                     Garryales
  0.50  8972    0       F       4390                                        Eucommiaceae
  0.50  8972    0       G       4391                                          Eucommia
  0.50  8972    8972    S       4392                                            Eucommia ulmoides
  0.02  341     0       -       33154         Opisthokonta
  0.02  341     0       K       33208           Metazoa
  0.02  341     0       -       6072              Eumetazoa
  0.02  341     0       -       33213               Bilateria
  0.02  341     0       -       33511                 Deuterostomia
  0.02  341     0       P       7711                    Chordata
  0.02  341     0       -       89593                     Craniata
  0.02  341     0       -       7742                        Vertebrata
  0.02  341     0       -       7776                          Gnathostomata
  0.02  341     0       -       117570                          Teleostomi
  0.02  341     0       -       117571                            Euteleostomi
  0.02  341     0       -       8287                                Sarcopterygii
  0.02  341     0       -       1338369                               Dipnotetrapodomorpha
  0.02  341     0       -       32523                                   Tetrapoda
  0.02  341     0       -       32524                                     Amniota
  0.02  341     0       C       40674                                       Mammalia
  0.02  341     0       -       32525                                         Theria
  0.02  341     0       -       9347                                            Eutheria
  0.02  341     0       -       1437010                                           Boreoeutheria
  0.02  341     0       -       314146                                              Euarchontoglires
  0.02  341     0       O       9443                                                  Primates
  0.02  341     0       -       376913                                                  Haplorrhini
  0.02  341     0       -       314293                                                    Simiiformes
  0.02  341     0       -       9526                                                        Catarrhini
  0.02  341     0       -       314295                                                        Hominoidea
  0.02  341     0       F       9604                                                            Hominidae
  0.02  341     0       -       207598                                                            Homininae
  0.02  341     0       G       9605                                                                Homo
  0.02  341     341     S       9606                                                                  Homo sapiens
 99.90  434053  434053  U       0       unclassified
  0.10  440     0       -       1       root
  0.10  440     0       -       131567    cellular organisms
  0.10  440     0       D       2759        Eukaryota
  0.08  364     0       K       33090         Viridiplantae
  0.08  364     0       P       35493           Streptophyta
  0.08  364     0       -       131221            Streptophytina
  0.08  364     0       -       3193                Embryophyta
  0.08  364     0       -       58023                 Tracheophyta
  0.08  364     0       -       78536                   Euphyllophyta
  0.08  364     0       -       58024                     Spermatophyta
  0.08  364     0       C       3398                        Magnoliopsida
  0.08  364     0       -       1437183                       Mesangiospermae
  0.08  364     0       -       71240                           eudicotyledons
  0.08  364     0       -       91827                             Gunneridae
  0.08  364     0       -       1437201                             Pentapetalae
  0.08  364     0       -       71274                                 asterids
  0.08  364     0       -       91888                                   lamiids
  0.08  364     0       O       91889                                     Garryales
  0.08  364     0       F       4390                                        Eucommiaceae
  0.08  364     0       G       4391                                          Eucommia
  0.08  364     364     S       4392                                            Eucommia ulmoides
  0.02  76      0       -       33154         Opisthokonta
  0.02  76      0       K       33208           Metazoa
  0.02  76      0       -       6072              Eumetazoa
  0.02  76      0       -       33213               Bilateria
  0.02  76      0       -       33511                 Deuterostomia
  0.02  76      0       P       7711                    Chordata
  0.02  76      0       -       89593                     Craniata
  0.02  76      0       -       7742                        Vertebrata
  0.02  76      0       -       7776                          Gnathostomata
  0.02  76      0       -       117570                          Teleostomi
  0.02  76      0       -       117571                            Euteleostomi
  0.02  76      0       -       8287                                Sarcopterygii
  0.02  76      0       -       1338369                               Dipnotetrapodomorpha
  0.02  76      0       -       32523                                   Tetrapoda
  0.02  76      0       -       32524                                     Amniota
  0.02  76      0       C       40674                                       Mammalia
  0.02  76      0       -       32525                                         Theria
  0.02  76      0       -       9347                                            Eutheria
  0.02  76      0       -       1437010                                           Boreoeutheria
  0.02  76      0       -       314146                                              Euarchontoglires
  0.02  76      0       O       9443                                                  Primates
  0.02  76      0       -       376913                                                  Haplorrhini
  0.02  76      0       -       314293                                                    Simiiformes
  0.02  76      0       -       9526                                                        Catarrhini
  0.02  76      0       -       314295                                                        Hominoidea
  0.02  76      0       F       9604                                                            Hominidae
  0.02  76      0       -       207598                                                            Homininae
  0.02  76      0       G       9605                                                                Homo
  0.02  76      76      S       9606                                                                  Homo sapiens

With upstream step having the following log:

$ cat .command.sh 
#!/bin/bash -euo pipefail
db_name=`find -L test-db-centrifuge -name "*.1.cf" -not -name "._*"  | sed 's/\.1.cf$//'`
centrifuge-kreport -x $db_name 2612_db3.centrifuge.results.txt > 2612_db3.centrifuge.txt

cat <<-END_VERSIONS > versions.yml
"NFCORE_TAXPROFILER:TAXPROFILER:PROFILING:CENTRIFUGE_KREPORT":
    centrifuge: $( centrifuge --version  | sed -n 1p | sed 's/^.*centrifuge-class version //')
END_VERSIONS
gitpod /workspace/taxprofiler/testing/work/e2/f913bcb9291d065de0883c86f03299 (master) $ cat .command.log 
Loading taxonomy ...
Loading names file ...
Loading nodes file ...
jfy133 commented 22 hours ago

I've ran out of time today unfortuantely,

But I need to run the two test profiles and compare what the centrifuge process itself reports.