nf-core / coproid

Coprolite host Identification pipeline
https://nf-co.re/coproid
MIT License
9 stars 2 forks source link

an error on sourcepredict (test command) #37

Closed cae803 closed 2 years ago

cae803 commented 3 years ago

Hi. Thank you for distributing the fantastic tool! I got the following error message about sourcepredict on the test command nextflow run nf-core/coproid -profile test,docker

executor >  local (34)
[24/bfa148] process > decomp_kraken                          [100%] 1 of 1 ✔
[58/8fb11a] process > fastqc (metagenomebis)                 [100%] 2 of 2 ✔
[4d/86e1a5] process > renameGenome1                          [100%] 1 of 1 ✔
[a2/c13f47] process > renameGenome2                          [100%] 1 of 1 ✔
[c6/f2701d] process > AdapterRemovalCollapse (metagenomebis) [100%] 2 of 2 ✔
[05/113fd2] process > BowtieIndexGenome1 (Bacillus_subtilis) [100%] 1 of 1 ✔
[27/80775b] process > AlignToGenome1 (metagenomebis)         [100%] 2 of 2 ✔
[42/d4ecff] process > bam2fq (metagenomebis)                 [100%] 2 of 2 ✔
[78/b187fb] process > BowtieIndexGenome2 (Escherichia_coli)  [100%] 1 of 1 ✔
[1c/f88ba9] process > AlignToGenome2 (metagenomebis)         [100%] 2 of 2 ✔
[c6/464bf6] process > pmdtoolsgenome1 (metagenomebis)        [100%] 2 of 2 ✔
[95/56f7a7] process > pmdtoolsgenome2 (metagenome)           [100%] 2 of 2 ✔
[f8/b912b0] process > kraken2 (metagenomebis)                [100%] 2 of 2 ✔
[e0/a30a0f] process > kraken_parse (metagenomebis)           [100%] 2 of 2 ✔
[c1/a5ef38] process > kraken_merge                           [100%] 1 of 1 ✔
[bb/135ba3] process > sourcepredict                          [100%] 1 of 1, failed: 1 ✘
[5b/0308e5] process > countBp2genomes (metagenomebis)        [100%] 2 of 2 ✔
[0a/b8525d] process > damageprofilerGenome1 (metagenomebis)  [100%] 2 of 2 ✔
[99/325890] process > damageprofilerGenome2 (metagenomebis)  [100%] 2 of 2 ✔
[-        ] process > concatenateRatios                      -
[-        ] process > generate_report_adna_2_genomes         -
[8a/d9c396] process > get_software_versions                  [100%] 1 of 1 ✔
[3d/f04710] process > multiqc                                [100%] 1 of 1 ✔
[45/bf327f] process > output_documentation                   [100%] 1 of 1 ✔
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/coproid] Pipeline completed with errors-
Error executing process > 'sourcepredict'

Caused by:
  Process `sourcepredict` terminated with an error exit status (1)

Command executed:

  sourcepredict -di 2 \
                -kne 0 \
                -me tsne \
                -n subsample \
                -l test_labels.csv \
                -s test_sources.csv \
                -t 2 \
                -o prediction.sourcepredict.csv \
                -e sourcepredict_embedding.csv kraken_merged.csv

Command exit status:
  1

Command output:
   2290000 generating entries... 
   2291000 generating entries... 
   2292000 generating entries... 
   2293000 generating entries... 
   2294000 generating entries... 
   2295000 generating entries... 
   2296000 generating entries... 
   2297000 generating entries... 
   2298000 generating entries... 
   2299000 generating entries... 
   2300000 generating entries... 
   2301000 generating entries... 
   2302000 generating entries... 
   2303000 generating entries... 
   2304000 generating entries... 
   2305000 generating entries... 
   2306000 generating entries... 
   2307000 generating entries... 
   2308000 generating entries... 
   2309000 generating entries... 
   2310000 generating entries... 
   2311000 generating entries... 
   2312000 generating entries... 
   2313000 generating entries... 
   2314000 generating entries... 
   2315000 generating entries... 
   2316000 generating entries... 
   2317000 generating entries... 
   2318000 generating entries... 
   2319000 generating entries... 
   2320000 generating entries... 
   2321000 generating entries... 
   2322000 generating entries... 
   2323000 generating entries... 
   2324000 generating entries... 
   2325000 generating entries... 
   2326000 generating entries... 
   2327000 generating entries... 
   2328000 generating entries... 
   2329000 generating entries... 
   2330000 generating entries... 
   2331000 generating entries... 
   2332000 generating entries... 
   2333000 generating entries... 
   2334000 generating entries... 
   2335000 generating entries... 
   2336000 generating entries... 
   2337000 generating entries... 
  Uploading to /tmp/.etetoolkit/taxa.sqlite

Command error:
  NCBI database not present yet (first time used?)
  Downloading taxdump.tar.gz from NCBI FTP site (via HTTP)...
  Done. Parsing...

  Inserting synonyms:          0 
  Inserting synonyms:       5000 
  Inserting synonyms:      10000 
  Inserting synonyms:      15000 
  Inserting synonyms:      20000 
  Inserting synonyms:      25000 
  Inserting synonyms:      30000 
  Inserting synonyms:      35000 Traceback (most recent call last):
    File "/opt/conda/envs/nf-core-coproid-1.1/bin/sourcepredict", line 10, in <module>
      sys.exit(main())
    File "/opt/conda/envs/nf-core-coproid-1.1/lib/python3.7/site-packages/sourcepredict/__main__.py", line 172, in main
      sm.compute_distance(distance_method=distance_method, rank=RANK)
    File "/opt/conda/envs/nf-core-coproid-1.1/lib/python3.7/site-packages/sourcepredict/sourcepredictlib/ml.py", line 251, in compute_distance
      ncbi = NCBITaxa()
    File "/opt/conda/envs/nf-core-coproid-1.1/lib/python3.7/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 110, in __init__
      self.update_taxonomy_database(taxdump_file)
    File "/opt/conda/envs/nf-core-coproid-1.1/lib/python3.7/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 129, in update_taxonomy_database
      update_db(self.dbfile)
    File "/opt/conda/envs/nf-core-coproid-1.1/lib/python3.7/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 760, in update_db
      upload_data(dbfile)
    File "/opt/conda/envs/nf-core-coproid-1.1/lib/python3.7/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 802, in upload_data
      db.execute("INSERT INTO synonym (taxid, spname) VALUES (?, ?);", (taxid, spname))
  sqlite3.IntegrityError: UNIQUE constraint failed: synonym.spname, synonym.taxid

Work dir:
  /home/cae803/opt/coproid/work/bb/135ba36e56db6c94bd9694208e094d

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

How do I fix it? I don't have the directory /opt/conda/ in my workstation. I am looking forward to using coproid!

gabrielinnocenti commented 3 years ago

Hi. Thank you for distributing the fantastic tool! I got the following error message about sourcepredict on the test command nextflow run nf-core/coproid -profile test,docker

executor >  local (34)
[24/bfa148] process > decomp_kraken                          [100%] 1 of 1 ✔
[58/8fb11a] process > fastqc (metagenomebis)                 [100%] 2 of 2 ✔
[4d/86e1a5] process > renameGenome1                          [100%] 1 of 1 ✔
[a2/c13f47] process > renameGenome2                          [100%] 1 of 1 ✔
[c6/f2701d] process > AdapterRemovalCollapse (metagenomebis) [100%] 2 of 2 ✔
[05/113fd2] process > BowtieIndexGenome1 (Bacillus_subtilis) [100%] 1 of 1 ✔
[27/80775b] process > AlignToGenome1 (metagenomebis)         [100%] 2 of 2 ✔
[42/d4ecff] process > bam2fq (metagenomebis)                 [100%] 2 of 2 ✔
[78/b187fb] process > BowtieIndexGenome2 (Escherichia_coli)  [100%] 1 of 1 ✔
[1c/f88ba9] process > AlignToGenome2 (metagenomebis)         [100%] 2 of 2 ✔
[c6/464bf6] process > pmdtoolsgenome1 (metagenomebis)        [100%] 2 of 2 ✔
[95/56f7a7] process > pmdtoolsgenome2 (metagenome)           [100%] 2 of 2 ✔
[f8/b912b0] process > kraken2 (metagenomebis)                [100%] 2 of 2 ✔
[e0/a30a0f] process > kraken_parse (metagenomebis)           [100%] 2 of 2 ✔
[c1/a5ef38] process > kraken_merge                           [100%] 1 of 1 ✔
[bb/135ba3] process > sourcepredict                          [100%] 1 of 1, failed: 1 ✘
[5b/0308e5] process > countBp2genomes (metagenomebis)        [100%] 2 of 2 ✔
[0a/b8525d] process > damageprofilerGenome1 (metagenomebis)  [100%] 2 of 2 ✔
[99/325890] process > damageprofilerGenome2 (metagenomebis)  [100%] 2 of 2 ✔
[-        ] process > concatenateRatios                      -
[-        ] process > generate_report_adna_2_genomes         -
[8a/d9c396] process > get_software_versions                  [100%] 1 of 1 ✔
[3d/f04710] process > multiqc                                [100%] 1 of 1 ✔
[45/bf327f] process > output_documentation                   [100%] 1 of 1 ✔
Execution cancelled -- Finishing pending tasks before exit
-[nf-core/coproid] Pipeline completed with errors-
Error executing process > 'sourcepredict'

Caused by:
  Process `sourcepredict` terminated with an error exit status (1)

Command executed:

  sourcepredict -di 2 \
                -kne 0 \
                -me tsne \
                -n subsample \
                -l test_labels.csv \
                -s test_sources.csv \
                -t 2 \
                -o prediction.sourcepredict.csv \
                -e sourcepredict_embedding.csv kraken_merged.csv

Command exit status:
  1

Command output:
   2290000 generating entries... 
   2291000 generating entries... 
   2292000 generating entries... 
   2293000 generating entries... 
   2294000 generating entries... 
   2295000 generating entries... 
   2296000 generating entries... 
   2297000 generating entries... 
   2298000 generating entries... 
   2299000 generating entries... 
   2300000 generating entries... 
   2301000 generating entries... 
   2302000 generating entries... 
   2303000 generating entries... 
   2304000 generating entries... 
   2305000 generating entries... 
   2306000 generating entries... 
   2307000 generating entries... 
   2308000 generating entries... 
   2309000 generating entries... 
   2310000 generating entries... 
   2311000 generating entries... 
   2312000 generating entries... 
   2313000 generating entries... 
   2314000 generating entries... 
   2315000 generating entries... 
   2316000 generating entries... 
   2317000 generating entries... 
   2318000 generating entries... 
   2319000 generating entries... 
   2320000 generating entries... 
   2321000 generating entries... 
   2322000 generating entries... 
   2323000 generating entries... 
   2324000 generating entries... 
   2325000 generating entries... 
   2326000 generating entries... 
   2327000 generating entries... 
   2328000 generating entries... 
   2329000 generating entries... 
   2330000 generating entries... 
   2331000 generating entries... 
   2332000 generating entries... 
   2333000 generating entries... 
   2334000 generating entries... 
   2335000 generating entries... 
   2336000 generating entries... 
   2337000 generating entries... 
  Uploading to /tmp/.etetoolkit/taxa.sqlite

Command error:
  NCBI database not present yet (first time used?)
  Downloading taxdump.tar.gz from NCBI FTP site (via HTTP)...
  Done. Parsing...

  Inserting synonyms:          0 
  Inserting synonyms:       5000 
  Inserting synonyms:      10000 
  Inserting synonyms:      15000 
  Inserting synonyms:      20000 
  Inserting synonyms:      25000 
  Inserting synonyms:      30000 
  Inserting synonyms:      35000 Traceback (most recent call last):
    File "/opt/conda/envs/nf-core-coproid-1.1/bin/sourcepredict", line 10, in <module>
      sys.exit(main())
    File "/opt/conda/envs/nf-core-coproid-1.1/lib/python3.7/site-packages/sourcepredict/__main__.py", line 172, in main
      sm.compute_distance(distance_method=distance_method, rank=RANK)
    File "/opt/conda/envs/nf-core-coproid-1.1/lib/python3.7/site-packages/sourcepredict/sourcepredictlib/ml.py", line 251, in compute_distance
      ncbi = NCBITaxa()
    File "/opt/conda/envs/nf-core-coproid-1.1/lib/python3.7/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 110, in __init__
      self.update_taxonomy_database(taxdump_file)
    File "/opt/conda/envs/nf-core-coproid-1.1/lib/python3.7/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 129, in update_taxonomy_database
      update_db(self.dbfile)
    File "/opt/conda/envs/nf-core-coproid-1.1/lib/python3.7/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 760, in update_db
      upload_data(dbfile)
    File "/opt/conda/envs/nf-core-coproid-1.1/lib/python3.7/site-packages/ete3/ncbi_taxonomy/ncbiquery.py", line 802, in upload_data
      db.execute("INSERT INTO synonym (taxid, spname) VALUES (?, ?);", (taxid, spname))
  sqlite3.IntegrityError: UNIQUE constraint failed: synonym.spname, synonym.taxid

Work dir:
  /home/cae803/opt/coproid/work/bb/135ba36e56db6c94bd9694208e094d

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

How do I fix it? I don't have the directory /opt/conda/ in my workstation. I am looking forward to using coproid!

I just experienced exactly the same problem.

maxibor commented 3 years ago

Hey @cae803 and @gabrielinnocenti , Thanks for the feedback ! This seems to be a known issue of one the library that sourcepredict is using, namely ete. This has already been fixed in their latest release, so I'll just update the version of ete in the coproID pipeline, and it should hopefully fix it !

Cheers

cae803 commented 3 years ago

Dear @maxibor , Thank you for suggesting the solution! I tried to edit the docker image used in this pipeline on my workstation, but I couldn't. Does the modification require editing the docker on nf-core? I would be very happy if you could upload the modified version!

Best

gabrielinnocenti commented 3 years ago

Dear @maxibor,

News on this issue? I'm still getting this error, using the command nextflow run nf-core/coproid -profile test,docker.

Thank you for your attention

maxibor commented 2 years ago

Dear @gabrielinnocenti and @cae803 , I took a while, but it's now fixed with v. 1.1.1 🙂