simroux / VirSorter

Source code of the VirSorter tool, also available as an App on CyVerse/iVirus (https://de.iplantcollaborative.org/de/)
GNU General Public License v2.0
104 stars 30 forks source link

Custom database - How to? #43

Open MeggyC opened 5 years ago

MeggyC commented 5 years ago

Hey there,

I want to make a custom database, or at least update the current RefSeq based database to a 2019 version. How am I to do this? I can't seem to find a manual? Inspection of db1 shows me that it is not a simple blast database.

Cheers, Megan.

mshamash commented 5 years ago

Hello,

I've got the same question as well. In addition, I'd like to use a ".fna" file as well as the Virome database. On CyVerse I can do this using "Additional viral sequence to be used as reference (optional)", but what is the command line argument for this if I run on my own server?

Closest I could find was the "--cp Custom phage sequence " argument, but no real documentation on how to use it, or if I just point to a FASTA file (my .fna file).

Best,

Mike

MeggyC commented 5 years ago

Hey there, I can actually help with that one:

You would use the --cp flag and your fasta file of additional sequences in conjunction with db1 and then I kept the database using the keepdatabase flag. The problem is that now I am unable to re-feed that database back in. At the moment I'm toying around with various options to try to make it work but if one of the developers could help further, that would be great.

This was my command

virsorter -f /srv/projects/coral/Seaquence_Accelerate_master_directory/analysis/20190520_metaspades_assemblies/metaspades_ROB3349A03-148_S3/contigs.fasta --db 1 --keep-db --cp "/srv/home/s4549287/20190429_RefSeq_Vir/refseqvir.fna" --wdir /srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_CR_148 --ncpu 10 --data-dir /srv/db/virsorter_data/

mshamash commented 5 years ago

Hi MeggyC,

Thanks for the command. I will give it a try today. In that case, if I set VirSorter to "decontamination mode" on the command line with my --cp custom database, will it remove all sequences that are not in the Refseq/Virome and my custom database, and return a new FASTA file with my contigs sequences? I am a bit confused as to how exactly the decontamination mode works and what the output is.

Cheers.

Mike

simroux commented 5 years ago

Hi Mike & Megan, Megan is right about the use of "--cp" on the command line, it will be the same thing as adding an "Additional viral sequence to be used as reference (optional)" on CyVerse (thanks for answering this ! ). Now for the other questions:

Let me know if this helps !

mshamash commented 5 years ago

Hi Simon,

Thank you for the detailed answer. So from my understanding, since the contigs I have are mostly viral, I should be using the "decontamination mode" to reveal more viral sequences which would otherwise be ignored as background.

Following up on the use of "--cp", is there any way to only compare contigs against my custom database FASTA file defined in "--cp", or must it always be Refseq/virome AND the --cp database?

Also, when using the "--keep-db" flag, wouldn't this just keep the files in the "r_0" db folder which I could then copy and paste into the "Phage_gene_catalog_plus_viromes" and overwrite to have my custom phages included each time?

All the best,

Michael

simroux commented 5 years ago

Hi Michael,

You're correct about the decontamination mode. For the cp flag, there is currently no way to only compare contigs to your custom database, the "--cp" is always on top of either RefSeq or Virome.

And thanks for reminding me of the "--keep-db" flag, I had completely forgotten we had put this here. So yes, scratch my comment about interrupting VirSorter, you can use this flag, and then look in the r_0 db folder and you should find a database that include your own custom phage.

Best, Simon

mshamash commented 5 years ago

Hi Simon,

And in that case, I could overwrite the files in the "Phage_gene_catalog_plus_viromes" (since I use viromes not refseq) with the contents from r_0 and it will always include my custom phages?

Best,

Michael

simroux commented 5 years ago

Correct, that is the expected behavior (I would suggest doing a backup copy of the "Phage_gene_catalog_plus_viromes" just in case though).

mshamash commented 5 years ago

Thank you! I will report back if any other issues come back on this topic, otherwise it seems easy enough to do.

Cheers.

EDIT: it worked perfectly! I copied files from "r_0" (excluding the folder named "initial_db") to the directory "Phage_gene_catalog_plus_viromes", overwriting all that was conflicting (after backing up original viromes database files)...and now every time I run VirSorted with --db 2 (viromes), it includes my custom phages automatically, no need for --cp or --keep-db flags.

MeggyC commented 5 years ago

Hey Simon,

My problem now is that I have not been able to successfully re-feed the database from the keep database flag back into virsorter. The program runs - but produces no output.

Here is my somewhat longwinded report on the issue:

Command:

Trial 1) to see whether virsorter accepts my new db - FLAGS: --db 1 --data-dir /srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_CR_148/r_0/db/

virsorter -f /srv/home/s4549287/tmp/trial1/contigs.fasta --db 1 --wdir /srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_mini1 --ncpu 10 --data-dir /srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_CR_148/r_0/db/ &> /srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_mini1/mini.log

Error: File existence/permissions problem in trying to open HMM file /srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_CR_148/r_0/db/PFAM_27/Pfam-A.hmm. HMM file /srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_CR_148/r_0/db/PFAM_27/Pfam-A.hmm no

Error: File existence/permissions problem in trying to open HMM file /srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_CR_148/r_0/db/PFAM_27/Pfam-B.hmm. HMM file /srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_CR_148/r_0/db/PFAM_27/Pfam-B.hmm no

BLAST Database error: No alias or index file found for protein database [/srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_mini1/r_0/db/Pool_new_unclustered] in search path [/srv/home/s4549287::] Can't open '/srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_mini1/Contigs_prots_vs_Phage_Gene_Catalog.tab' for reading: 'No such file or directory' at /usr/local/bin/Scripts/Step_2_merge_contigs_annotation.pl line 103 Can't open '/srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_mini1/VIRSorter_affi-contigs.csv' for reading: 'No such file or directory' at /usr/local/bin/Scripts/Step_3_highlight_phage_signal.pl line 59 Can't open '/srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_mini1/VIRSorter_phage-signal.csv' for reading: 'No such file or directory' at /usr/local/bin/Scripts/Step_4_summarize_phage_signal.pl line 83

Error: File existence/permissions problem in trying to open HMM file /srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_CR_148/r_0/db/PFAM_27/Pfam-A.hmm. HMM file /srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_CR_148/r_0/db/PFAM_27/Pfam-A.hmm no

Error: File existence/permissions problem in trying to open HMM file /srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_CR_148/r_0/db/PFAM_27/Pfam-B.hmm. HMM file /srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_CR_148/r_0/db/PFAM_27/Pfam-B.hmm no

Can't open '/srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_mini1/Contigs_prots_vs_Phage_Gene_unclustered.tab' for reading: 'No such file or directory' at /usr/local/bin/Scripts/Step_2_merge_contigs_annotation.pl line 79 Can't open '/srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_mini1/VIRSorter_affi-contigs.csv' for reading: 'No such file or directory' at /usr/local/bin/Scripts/Step_3_highlight_phage_signal.pl line 59 Can't open '/srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_mini1/VIRSorter_phage-signal.csv' for reading: 'No such file or directory' at /usr/local/bin/Scripts/Step_4_summarize_phage_signal.pl line 83

Trial 2) FLAGS: no --db flag --data dir /srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_CR_148/r_0/db/

virsorter -f /srv/home/s4549287/tmp/trial1/contigs.fasta --wdir /srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_mini2 --ncpu 10 --data-dir /srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_CR_148/r_0/db/

Error: File existence/permissions problem in trying to open HMM file /srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_CR_148/r_0/db/PFAM_27/Pfam-A.hmm. HMM file /srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_CR_148/r_0/db/PFAM_27/Pfam-A.hmm no

Error: File existence/permissions problem in trying to open HMM file /srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_CR_148/r_0/db/PFAM_27/Pfam-B.hmm. HMM file /srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_CR_148/r_0/db/PFAM_27/Pfam-B.hmm no

BLAST Database error: No alias or index file found for protein database [/srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_mini2/r_0/db/Pool_new_unclustered] in search path [/srv/home/s4549287::] Can't open '/srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_mini2/Contigs_prots_vs_Phage_Gene_Catalog.tab' for reading: 'No such file or directory' at /usr/local/bin/Scripts/Step_2_merge_contigs_annotation.pl line 103 Can't open '/srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_mini2/VIRSorter_affi-contigs.csv' for reading: 'No such file or directory' at /usr/local/bin/Scripts/Step_3_highlight_phage_signal.pl line 59 Can't open '/srv/projects/coral/Accelerate_project/analyses/20190619_VirSorter_assemblies/test_mini2/VIRSorter_phage-signal.csv' for reading: 'No such file or directory' at /usr/local/bin/Scripts/Step_4_summarize_phage_signal.pl line 83

These are errors for the trial2 - just data-dir to r_0/db given

Of course, for all my assemblies I can create a new (but identical) database every time from the same .fna file. However, this seems a little computationally expensive to me. Thanks for your help!

simroux commented 5 years ago

Hi Megan, Yes, unfortunately, you won't be able to feed "r_0" directory as "data_dir", because it doesn't have all the databases. The way this should (hopefully) work is as follows: