harta55 / EnTAP

Eukaryotic Non-Model Transcriptome Annotation Pipeline - Latest Release v1.3.0 - HGT Analysis Released! Revamped figures/graphics coming soon.
https://entap.readthedocs.io/en/latest/
GNU General Public License v3.0
37 stars 9 forks source link

DIAMOND database error #92

Open pavlo888 opened 1 month ago

pavlo888 commented 1 month ago

Dear @harta55

I am very excited to try the EnTAP pipeline with some transcriptome data.

I am using singularity to run the pipeline. I have installed it successfully and then conducted the initial configuration step like this: (base) usuari@WS0202:~/Downloads/Polystigma-transcriptome$ singularity exec entap.sif EnTAP --config --run-ini entap_run.params --entap-ini entap_config.ini Parsing ini file at: entap_run.params Parsing ini file at: entap_config.ini ini files parsed, debug logging will continue at: entap_outfiles/debug_2024Y7M8D-13h28m51s.txt Running EnTAP configuration… Downloading EnTAP database… Configuring EnTAP database... Success Checking EggNOG database.., Configuring EggNOG database... EnTAP configuration complete However, now when I am trying to conduct the test run, I get this issue:

`(base) usuari@WS0202:~/Downloads/Polystigma-transcriptome$ singularity exec entap.sif EnTAP --config --run-ini entap_run.params --entap-ini entap_config.ini Parsing ini file at: entap_run.params Parsing ini file at: entap_config.ini ini files parsed, debug logging will continue at: test_data/debug_2024Y7M10D-12h20m55s.txt Error code: 10

Databases have been selected for indexing. The test run of DIAMOND has failed! `

Could you please help me solve this issue?

Thanks!

harta55 commented 1 month ago

Hi! What version of EnTAP are you running?

harta55 commented 1 month ago

Also please send over the debug_2024Y7M10D-12h20m55s.txt file as well

pavlo888 commented 1 month ago

Hi @harta55

I have managed to fix the previous issue but I am not facing another issue. It seems now my eggnog database is not in the correct format. I am getting error code 31. I have attached the debug file.

debug_2024Y7M8D-13h28m51s.txt

I am thinking of downloading the eggnog database manually but I am not sure it will be in the correct format.

harta55 commented 1 month ago

Is there another log? The attached one looks like everything worked normally and put the EggNOG database into the entap_outfiles/databases directory

pavlo888 commented 1 month ago

Hi @harta55

I got this other log. I am sending it to you here.

log_file_2024Y7M18D-8h11m22s.txt

pavlo888 commented 1 month ago

Hi @harta55

In the debug file I first sent, this was at the end of the log

`Thu Jul 25 09:26:11 2024: Executing command: /home/usuari/anaconda3/bin/diamond makedb --in /home/usuari/Downloads/Polystigma-transcriptome/entap_outfiles/databases/eggnog.db -d test_data//bin/eggnog -p 1 Thu Jul 25 09:26:11 2024: Std Err: diamond v2.1.9.163 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen Documentation, support and updates available at http://www.diamondsearch.org Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

CPU threads: 1

Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Database input file: /home/usuari/Downloads/Polystigma-transcriptome/entap_outfiles/databases/eggnog.db Opening the database file... Error: Error detecting input file format. First line must begin with '>' (FASTA) or '@' (FASTQ).

Thu Jul 25 09:26:11 2024: Printing to files: Std Out: test_data//bin/eggnog_std.out Std Err: test_data//bin/eggnog_std.err Thu Jul 25 09:26:11 2024: Error code (31) not recognized Thu Jul 25 09:26:11 2024: Error code: 31

Error indexing database at: /home/usuari/Downloads/Polystigma-transcriptome/entap_outfiles/databases/eggnog.db DIAMOND Error: diamond v2.1.9.163 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen Documentation, support and updates available at http://www.diamondsearch.org Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

CPU threads: 1

Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Database input file: /home/usuari/Downloads/Polystigma-transcriptome/entap_outfiles/databases/eggnog.db Opening the database file... Error: Error detecting input file format. First line must begin with '>' (FASTA) or '@' (FASTQ).

Thu Jul 25 09:26:11 2024: End - EnTAP`

Any idea how to solve it?

harta55 commented 1 month ago

I think you may have attached a different debug file. debug_2024Y7M8D-13h28m51s.txt was attached, but it sounds like your issues are in debug_2024Y7M10D-12h20m55s.txt? Please let me know, I may be missing something. I can't find the snippet you just send in the debug file that was attached.

pavlo888 commented 1 month ago

I have the test data command again, using the following command:

singularity exec entap.sif EnTAP --config --run-ini entap_run.params --entap-ini entap_config.ini

This is the message that appears on the Terminal:

`Parsing ini file at: entap_run.params Parsing ini file at: entap_config.ini ini files parsed, debug logging will continue at: test_data/debug_2024Y7M31D-8h24m20s.txt Running EnTAP configuration… Downloading EnTAP database… Configuring EnTAP database... Success Configuring DIAMOND databases... Error code: 31

Error indexing database at: /home/usuari/Downloads/Polystigma-transcriptome/entap_outfiles/databases/eggnog.db DIAMOND Error: diamond v2.1.9.163 (C) Max Planck Society for the Advancement of Science, Benjamin Buchfink, University of Tuebingen Documentation, support and updates available at http://www.diamondsearch.org Please cite: http://dx.doi.org/10.1038/s41592-021-01101-x Nature Methods (2021)

CPU threads: 1

Scoring parameters: (Matrix=BLOSUM62 Lambda=0.267 K=0.041 Penalties=11/1) Database input file: /home/usuari/Downloads/Polystigma-transcriptome/entap_outfiles/databases/eggnog.db Opening the database file... Error: Error detecting input file format. First line must begin with '>' (FASTA) or '@' (FASTQ). `

And these the two logs log_file_2024Y7M31D-8h24m20s.txt

debug_2024Y7M31D-8h24m20s.txt

Hopefully we can solve this now

Cheers, Pablo

harta55 commented 1 month ago

Perfect. Ok I see the issue. For the database command in the entap_run.params file, you'll want to specify up to 8 FASTA formatted references (such as Swiss-Prot or RefSeq Complete). It looks like you have the EggNOG database here instead: database: /home/usuari/Downloads/Polystigma-transcriptome/entap_outfiles/databases/eggnog.db,

That is causing the DIAMOND configuration to fail since it needs a FASTA formatted database. There's some info in the docs for how to prepare a reference database (https://entap.readthedocs.io/en/latest/Getting_Started/Configuration/configuration.html), but feel free to let me know if you have any questions on how to generate that.

pavlo888 commented 1 month ago

I have managed (I think) to successfully run the configuration. In the entap_run.params file, I have added the following line:

database=/home/usuari/Downloads/Polystigma-transcriptome/uniprot_sprot.fasta

Then it took like almost 2 days of running and the last message I saw was that it was run successfully. log_file_2024Y8M2D-8h18m39s.txt

Then I tried to do a real run with my own data but then again I have an error with the database. `(base) usuari@WS0202:~/Downloads/Polystigma-transcriptome$ singularity exec entap.sif EnTAP --runP --run-ini entap_run.params --entap-ini entap_config.ini Parsing ini file at: entap_run.params Parsing ini file at: entap_config.ini ini files parsed, debug logging will continue at: run1/debug_2024Y8M6D-8h53m18s.txt Error code: 72

Unable to open EnTAP database from paths given EnTAP Database Error: Serialized EnTAP database does not exist at: /bin/entap_database.bin ` I am not sure where /bin/entap_databse.bin is so I am not sure how to fix the issue. In my entap_run.params file I added the following line: database=/home/usuari/Downloads/Polystigma-transcriptome/test_data/bin/eggnog_proteins.dmnd, /home/usuari/Downloads/Polystigma-transcriptome/test_data/bin/uniprot_sprot.dmnd

However, I am not sure how to proceed. I have checked the readthedocs website for EnTAP but I cannot find something to solve my issue.

Thanks for your help

pavlo888 commented 1 month ago

so I fixed the previous issue by adding the following line in the entap_config.ini file on position 25: entap-db-bin=entap_outfiles/bin/entap_database.bin

However, now I have an issue with Transdecoder, `ini files parsed, debug logging will continue at: run1/debug_2024Y8M6D-13h55m26s.txt Error code: 10

Could not execute a test run of Transdecoder, be sure it's properly installed and the executable is correct `

Not sure how to fix this since I am using the singularity image form EnTAP and all required should be included in the image, right?

log_file_2024Y8M6D-14h28m23s.txt

Any idea how to solve it?