iquasere / reCOGnizer

A tool for domain based annotation with databases from the Conserved Domains Database
BSD 3-Clause "New" or "Revised" License
30 stars 2 forks source link

Setup database bug #15

Closed davidecrs closed 8 months ago

davidecrs commented 1 year ago

Hi,

I was setting up the database before running recognizer on prodigal predicted orf (on multiple samples), so I have run the following command:

recognizer --download-resources --resources-directory .

but I have encountered these errors, which seem to prevent the next step of running multiple recognizer on different predicted orf.

cat: 'reCOGnizer_results/blast/KOG_*_aligned.blast': No such file or directory
2023-09-07 17:12:30: Organizing annotation results
cat: 'reCOGnizer_results/blast/CDD_*_aligned.blast': No such file or directory
[1/8] Handling CDD annotation
cat: 'reCOGnizer_results/blast/Pfam_*_aligned.blast': No such file or directory
[2/8] Handling Pfam annotation
cat: 'reCOGnizer_results/blast/NCBIfam_*_aligned.blast': No such file or directory
[3/8] Handling NCBIfam annotation
cat: 'reCOGnizer_results/blast/Protein_Clusters_*_aligned.blast': No such file or directory
[4/8] Handling Protein_Clusters annotation
cat: 'reCOGnizer_results/blast/Smart_*_aligned.blast': No such file or directory
[5/8] Handling Smart annotation
cat: 'reCOGnizer_results/blast/TIGRFAM_*_aligned.blast': No such file or directory
[6/8] Handling TIGRFAM annotation
cat: 'reCOGnizer_results/blast/COG_*_aligned.blast': No such file or directory
[7/8] Handling COG annotation
cog2ko not found! Going to build it
join: -:13109: is not sorted: 1001530.BACE01000023_gene1    COG5184
join: ./string2ko.tsv:61: is not sorted: 264203.ZMO1579 K06889
join: input is not in sorted order
cat: 'reCOGnizer_results/blast/KOG_*_aligned.blast': No such file or directory
[8/8] Handling KOG annotation
2023-09-07 17:15:06: reCOGnizer analysis finished in 00h17m04s

Do you know how can I solve this issue?

Best, Davide

iquasere commented 1 year ago

Greetings! The warnings you are getting report that the annotation results can't be found, which makes sense since you aren't annotating anything, just downloading the resources. You can ignore those safely, as long as they don't show up when you annotate sequences specified through the --file parameter.

However, when I implemented the option to download the resources prior to annotation, it was supposed for reCOGnizer to stop after downloading. It shouldn't search for annotation results, since those are not supposed to be produced.

I have reproduced this on my end, with the latest version of reCOGnizer (1.9.2). Again, the warnings are not problematic since the resources were downloaded correctly, but it's still a bug that will be fixed today.

davidecrs commented 9 months ago

Hi @iquasere ,

when downloading the database, the following files recognizer.log and taxonomy.rdf are saved in the working directory even if I'm executing the following code

(cd temp/recognizer_db & recognizer --resources-directory temp/recognizer_db --download-resources --output temp/recognizer_db/null_results )

After the execution I have,

$ ls -1 

recognizer.log
taxonomy.rdf
temp

Is it possible to redirect the path of recognizer.log and taxonomy.rdf to the same -resources-directory path?

recognizer version: 1.9.2

Best regards, Davide

davidecrs commented 9 months ago

Hi @iquasere ,

sorry for the double message, I also wanted to add that by running the following command

recognizer --resources-directory recognizer_extdb --download-resources --output recognizer_extdb/null_results

the exit code is 1

iquasere commented 8 months ago

Was very busy on these last couple weeks, but am now tackling this issue. My goal is to make downloading reCOGnizer's resources much simpler. For now the plan is to use the recognizer_dwnl.timestamp file as evidence that the databses have already been downloaded, and if the file is present, skip downloading, if not, download everything.

This approach puts more responsibility on the user side, since if some file is altered everything else will have to be downloaded again (unless some backup was made). It will, however, make the entire databases download process less prone to errors, such as the one you have identified, and much simpler (it will remove many CLI parameters, and if the user wants to redownload the databases, he only needs to remove that file).

iquasere commented 8 months ago

With release 1.10, all these issues were approached, together with many others. Database installation is now much simpler, and actually can be almost ignored. Just make sure -rd parameter is set to the desired value, or don't even use it if you have no problems downloading databases to your folder.

davidecrs commented 8 months ago

Many thanks. I will try this new version.

davidecrs commented 8 months ago

Hi @iquasere ,

I tried this new version using

recognizer -rd recog_db/

it worked fine. However, the exit status code is 1 (I don't know if you're interested, but in a pipeline, this may interrupt the workflow).

Also the files recognizer.log and taxonomy.rdf are no longer created (I assume correctly).

In any case, thanks again for these improvements!

Best regards

iquasere commented 8 months ago

Strange... it's passing in the tests at GitHub, and an exit code other than 0 should trigger a failed test. I am currently updating reCOGnizer in a snakemake pipeline that includes it, let me see if it fails using this last version.

SrichandanPadhi commented 5 months ago

I am actually going through the same issue, I did not find the file recognizer_dwnl.timestamp in the folder. And if I am downloading the using "recognizer --download-resources --resources-directory resources_directory", following error is displayed.

recognizer: error: argument -dr/--download-resources: expected one argument

iquasere commented 5 months ago

@SrichandanPadhi the --download-resources parameter is deprecated, and shouldn't be used anymore. In the next version, I am removing it entirely, and should have removed it already from the documentation.

See if simply removing the --download-resources parameter fixes it, please.

SrichandanPadhi commented 5 months ago

Sorry for delay in replying. As instructed, I tried removing --download-resources, it worked, the databases have been downloaded; however, ended with an issue while extracting. Please see the following. A screenshot has been enclosed for your reference and action please.

tar -xzf cdd.tar.gz --wildcards "*.smp" 2024-03-22 20:58:16: Building taxonomy.tsv No input file specified. Exiting.

recognizer issues- 1

SrichandanPadhi commented 5 months ago

Hey Joao, Thank you very much, for the suggestion.

Despite the issue, I tried running the annotation using recognizer. Whatever the issues remained during the --download_resources were solved and the missing files were rebuilt very very nice and fine.

Now I am going to use the KeggCharter using the recognizer output. In case any issues I will let you know. Thanks again.

iquasere commented 5 months ago

reCOGnzier now doesn't require the command to download resources at all. You can just run it as

recognizer -i input.fasta -o output -rd resources_directory

but I do see that this is not obvious in the documentation. Once I have the time, I'll fix that.

Glad you are using it now, together with KEGGCharter. Do let me know if it helps your work, or if you encounter any more problems!

SrichandanPadhi commented 5 months ago

Yes, sure..thanks for your support.

On Mon, Mar 25, 2024, 10:10 PM João Sequeira @.***> wrote:

reCOGnzier now doesn't require the command to download resources at all. You can just run it as

recognizer -i input.fasta -o output -rd resources_directory

but I do see that this is not obvious in the documentation. Once I have the time, I'll fix that.

Glad you are using it now, together with KEGGCharter. Do let me know if it helps your work, or if you encounter any more problems!

— Reply to this email directly, view it on GitHub https://github.com/iquasere/reCOGnizer/issues/15#issuecomment-2018437182, or unsubscribe https://github.com/notifications/unsubscribe-auth/A6XMJUDGCWVWFKX4YYWZSXDY2BHRVAVCNFSM6AAAAAA4PG6OO6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJYGQZTOMJYGI . You are receiving this because you were mentioned.Message ID: @.***>

davidecrs commented 5 months ago

Hi,

Is it still possible to download the database before running the annotation?

Best regards