simroux / VirSorter

Source code of the VirSorter tool, also available as an App on CyVerse/iVirus (https://de.iplantcollaborative.org/de/)
GNU General Public License v2.0
104 stars 30 forks source link

Diamond version error? #28

Closed istmobiome closed 6 years ago

istmobiome commented 6 years ago

Hi!

Quick question. I followed this recipe to install and run VirSorter on some metagenome samples. (http://merenlab.org/2018/02/08/importing-virsorter-annotations/). The recipe looks very similar to the instructions on this git page. I am running VirSoter on a HPC so I installed using conda including the dependencies. Everything looked fine after install.

In short there is a diamond issue and I am wondering if someone could help me troubleshoot the issue. I am happy to provide files where needed.

I then ran this: wrapper_phage_contigs_sorter_iPlant.pl -f EPM.fa --ncpu 20 --db 2 --wdir VIRSORTER_EPM --data-dir~/miniconda3/envs/virsorter/bin/virsorter-data/ --diamond

And this threw an error about not being able to locate diamond (which is installed in the anvio env). So I installed diamond in virsorter using conda install. When the job finished the err file had this error:

Error: Database was built with a different version of Diamond and is incompatible.

Question: Is this referring to --db 2?

followed by ~36k lines of this error:

sh: line 8: ~/miniconda3/envs/virsorter/VirSorter/Scripts/Sliding_windows_3: No such file or directory

In addition the files in the Predicted_viral_sequences directory were empty as was VIRSorter_global-phage-signal.csv

Perhaps installing diamond in the env first and then building virsorter and dbs would fix the issue?

simroux commented 6 years ago

Hi ! It looks like the diamond version you got from conda is not compatible with the one we used to build the database. Did you try to install all the required softwares using the command lines suggested in the section "Using a conda virtual environment (tested on Ubuntu and CentOS)" of the readme (https://github.com/simroux/VirSorter/blob/master/README.md) ? If not, that may be a good starting point. Otherwise, we should check with @brymerr921 which version of diamond was used to create these database, and add this to the conda install line to ensure compatibility.

istmobiome commented 6 years ago

Howdy,

I did try that but I will give it another shot. FYI the version of diamond that gets installed with this conda recipe is 0.9.21-1.

Thanks for the advice.

Jarrod

On Wed, May 2, 2018 at 4:01 PM simroux notifications@github.com wrote:

Hi ! It looks like the diamond version you got from conda is not compatible with the one we used to build the database. Did you try to install all the required softwares using the command lines suggested in the section "Using a conda virtual environment (tested on Ubuntu and CentOS)" of the readme ( https://github.com/simroux/VirSorter/blob/master/README.md) ? If not, that may be a good starting point. Otherwise, we should check with @brymerr921 https://github.com/brymerr921 which version of diamond was used to create these database, and add this to the conda install line to ensure compatibility.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/simroux/VirSorter/issues/28#issuecomment-386120033, or mute the thread https://github.com/notifications/unsubscribe-auth/AlDB0nyGd7pXaiDtPp9SYaVin7zEfxvvks5tuh7AgaJpZM4Tt3n5 .

brymerr921 commented 6 years ago

The diamond version I used was 0.9.14. You have several options at this point.

  1. With your new version of Diamond, make new databases from each *.faa file in Phage_gene_catalog and Phage_gene_catalog_plus_viromes. There should be two FAA files in each directory.

    diamond makedb --in Pool_unclustered.faa --db Pool_unclustered
    diamond makedb --in Pool_new_unclustered.faa --db Pool_new_unclustered

    This should overwrite the existing databases and make new ones compatible with your version.

  2. Looking at the github page for Diamond, it appears the database structure changed in v0.9.19. You can follow the instructions there to recreate the Diamond database, though this will likely have the same effect as option 1.

  3. Use Diamond version 0.9.14 (which is what I used). conda install -c bioconda diamond=0.9.14

Please let me know what you try and what does/doesn't work and I'll update the documentation in both places.

istmobiome commented 6 years ago

ok, so I opted for option # 3 and it ran without any errors. For set up I ran: conda create --name virsorter -c bioconda mcl=14.137 muscle blast perl-bioperl perl-file-which hmmer=3.1b2 perl-parallel-forkmanager perl-list-moreutils diamond=diamond=0.9.14

I followed the rest of the recipe as is. I put the dbs in ~/miniconda/envs/virsorter/bin directory.

I tested VirSorter on a single marine metagenome using this command: wrapper_phage_contigs_sorter_iPlant.pl -f contigs.fa --ncpu 4 --db 2 --wdir EPM --data-dir ~/miniconda3/envs/virsorter/bin/virsorter-data/ --diamond

The err file in the log directory was empty. So thats great. However, there is nothing save headers in the VIRSorter_global-phage-signal.csv file and all the files in the Predicted_viral_sequences directory are empty. Guess I was pretty surprised by this. Is there a test dataset I could run to check my install? I did not see one in the virsorter package. Thanks a bunch!

simroux commented 6 years ago

Hi Jarrod,

There are a few test datasets available on CyVerse (https://de.cyverse.org/de/), that you can download at http://datacommons.cyverse.org/browse/iplant/home/shared/imicrobe/VirSorter. The input fasta files are in the folder "Benchmark_datasets" and the expected results are in "Benchmark_results". Note that these are expected results using the "old" database on CyVerse, so you will likely not have exactly the same result, but if you try e.g. "Vir_0.fna", you should get some viral sequences predicted (if not, then something is wrong).

For your previous test, please also note that VirSorter usually doesnt "like" do run on the same directory, so if ever that's what you did there, I would suggest running the same dataset but with a different working directory.

Best, Simon

istmobiome commented 6 years ago

Thank you Simon. I will give it a shot and let you know.

Best Jarrod

On Thu, May 3, 2018 at 3:42 PM simroux notifications@github.com wrote:

Hi Jarrod,

There are a few test datasets available on CyVerse ( https://de.cyverse.org/de/), that you can download at http://datacommons.cyverse.org/browse/iplant/home/shared/imicrobe/VirSorter. The input fasta files are in the folder "Benchmark_datasets" and the expected results are in "Benchmark_results". Note that these are expected results using the "old" database on CyVerse, so you will likely not have exactly the same result, but if you try e.g. "Vir_0.fna", you should get some viral sequences predicted (if not, then something is wrong).

For your previous test, please also note that VirSorter usually doesnt "like" do run on the same directory, so if ever that's what you did there, I would suggest running the same dataset but with a different working directory.

Best, Simon

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/simroux/VirSorter/issues/28#issuecomment-386430025, or mute the thread https://github.com/notifications/unsubscribe-auth/AlDB0vY3IglFFTIp9jeCMLyL2UsFHcCsks5tu2ucgaJpZM4Tt3n5 .

brymerr921 commented 6 years ago

Jarrod, are there any clues as to what might be wrong in the logs/out file? Also, if MCL is running correctly then the output of that usually ends up in the err file.

jarrodscott commented 6 years ago

Hi @brymerr921

No clues yet. I did a fresh install including getting the dbs. I ran the Mic_1.fna metagenomic dataset from CyVerse and got the same result. In the virsorter env mcl is there and mcl --version gives mcl 14-137.

I am happy to pass along the log/out file if you want to have a look. Just tell me the best place to post/send. Thanks again--I will keep looking--probably something simple.

simroux commented 6 years ago

Hi Jarrod,

You should be able to make a zip package from the output directory (preferably from the clean install and Mic_1 run) and attach it here. I'd be happy to take a look and see if we can figure this out !

Best, Simon

istmobiome commented 6 years ago

Hi Simon, Thanks a ton for having a look.

Here a dropbox ink to the folder. https://www.dropbox.com/s/13hd9w371bvxl44/VIRSORTER_TEST_MIC_1.zip?dl=0

this is the command I used: wrapper_phage_contigs_sorter_iPlant.pl -f Mic_1.fna --ncpu 2 --db 1 --wdir VIRSORTER_TEST --data-dir /pool/genomics/stri_istmobiome/dbs/virsorter/virsorter-data --diamond

Best, Jarrod

On Fri, May 4, 2018 at 11:18 AM simroux notifications@github.com wrote:

Hi Jarrod,

You should be able to make a zip package from the output directory (preferably from the clean install and Mic_1 run) and attach it here. I'd be happy to take a look and see if we can figure this out !

Best, Simon

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/simroux/VirSorter/issues/28#issuecomment-386651640, or mute the thread https://github.com/notifications/unsubscribe-auth/AlDB0pQJG8XHdNO3J6PFx5F6u328zT3Jks5tvH9CgaJpZM4Tt3n5 .

jarrodscott commented 6 years ago

sorry, the --ncpu was 20 not 2

On Fri, May 4, 2018 at 12:04 PM jarrod notifications@github.com wrote:

Hi Simon, Thanks a ton for having a look.

Here a dropbox ink to the folder. https://www.dropbox.com/s/13hd9w371bvxl44/VIRSORTER_TEST_MIC_1.zip?dl=0

this is the command I used: wrapper_phage_contigs_sorter_iPlant.pl -f Mic_1.fna --ncpu 2 --db 1 --wdir VIRSORTER_TEST --data-dir /pool/genomics/stri_istmobiome/dbs/virsorter/virsorter-data --diamond

Best, Jarrod

On Fri, May 4, 2018 at 11:18 AM simroux notifications@github.com wrote:

Hi Jarrod,

You should be able to make a zip package from the output directory (preferably from the clean install and Mic_1 run) and attach it here. I'd be happy to take a look and see if we can figure this out !

Best, Simon

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/simroux/VirSorter/issues/28#issuecomment-386651640, or mute the thread < https://github.com/notifications/unsubscribe-auth/AlDB0pQJG8XHdNO3J6PFx5F6u328zT3Jks5tvH9CgaJpZM4Tt3n5

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/simroux/VirSorter/issues/28#issuecomment-386665120, or mute the thread https://github.com/notifications/unsubscribe-auth/APcStnm0mEG7v8X5S1jluyypM5gpyLA7ks5tvIopgaJpZM4Tt3n5 .

simroux commented 6 years ago

This is strange.. it looks like everything went right, but the call to the "Sliding_windows_3" script seems to fail. Just to be sure, you are not using the docker container but pulling from the github repo, right ? I will try to look into this, but it may take a couple of days unfortunately.

istmobiome commented 6 years ago

No worries. Whenever you have time. Yes, I pulled from git.

One thing I forgot to mention is that I am running this on a compute cluster. Perhaps it has something to do with the different physical locations of the query data, dbs, and program. I have played around with moving things around a little but to no end.

On Fri, May 4, 2018 at 12:35 PM simroux notifications@github.com wrote:

This is strange.. it looks like everything went right, but the call to the "Sliding_windows_3" script seems to fail. Just to be sure, you are not using the docker container but pulling from the github repo, right ? I will try to look into this, but it may take a couple of days unfortunately.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/simroux/VirSorter/issues/28#issuecomment-386673262, or mute the thread https://github.com/notifications/unsubscribe-auth/AlDB0nmhsa7sVwK0n6wRG0cC_6zaV4xQks5tvJFfgaJpZM4Tt3n5 .

brymerr921 commented 6 years ago

Hi Jarrod, I ran your FASTA file on my local installation of VirSorter and it ran just fine. However, our out files look different. Can you try going to VirSorter/Scripts and run make in that directory and then try re-running VirSorter? Based on your out file, it looks like the program Sliding_window_3 is not feeding the appropriate information into the out file.

In my "out" file, I see lines like this:

Gene VIRSorter_contig-100_952_length_2513_read_count_91 / gene_1 -> category 0 -> putative hallmark

interspersed with lines like this:

0       5       1       2.47873542367956        0

The line with only numbers is the output of the C script Sliding_windows_3 which is fed back into VirSorter via stdout. If lots of lines with numbers aren't appearing in your out file, I suspect you have an issue with Sliding_windows_3 not running correctly.

I tested this with the master version of the repo as well as version e98d2f8f473b3028793b8a037c91648d1453f7a0 as is mentioned in the tutorial on the Anvi'o page, and both worked fine. I'm interested to see what happens!

istmobiome commented 6 years ago

Hi brymerr921,

I tried this but unfortunately still no results. I recompiled everything then ran. No go. Then I ran make in the Scripts directory which Sliding_windows_3. Same result. I will keep at it and let you know what I come up with. In the meantime, have a great weekend and thanks for the help!

Jarrod

On Fri, May 4, 2018 at 2:35 PM brymerr921 notifications@github.com wrote:

Hi Jarrod, I ran your FASTA file on my local installation of VirSorter and it ran just fine. However, our out files look different. Can you try going to VirSorter/Scripts and run make in that directory? Based on your out file, it looks like the program Sliding_window_3 is not feeding the appropriate information into the out file.

In my "out" file, I see lines like this:

Gene VIRSorter_contig-100_952_length_2513_read_count_91 / gene_1 -> category 0 -> putative hallmark

interspersed with lines like this:

0 5 1 2.47873542367956 0

The line with only numbers is the output of the C script Sliding_windows_3 which is fed back into VirSorter via stdout. If the numbers aren't appearing in your out file, I suspect you have an issue with Sliding_windows_3 not running correctly.

I tested this with the master version of the repo as well as version e98d2f8f473b3028793b8a037c91648d1453f7a0 as is mentioned in the tutorial on the Anvi'o page, and both worked fine. I'm interested to see what happens!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/simroux/VirSorter/issues/28#issuecomment-386710593, or mute the thread https://github.com/notifications/unsubscribe-auth/AlDB0pK9g7bGNuZshQMT8tTOR1BWgg1Rks5tvK2AgaJpZM4Tt3n5 .

jarrodscott commented 6 years ago

One last question. And this is delving deep into my ignorance. Could it have something to do with the compiler I'm using? I compiled with gcc 4.9.2...

On Fri, May 4, 2018 at 5:05 PM jarrod notifications@github.com wrote:

Hi brymerr921,

I tried this but unfortunately still no results. I recompiled everything then ran. No go. Then I ran make in the Scripts directory which Sliding_windows_3. Same result. I will keep at it and let you know what I come up with. In the meantime, have a great weekend and thanks for the help!

Jarrod

On Fri, May 4, 2018 at 2:35 PM brymerr921 notifications@github.com wrote:

Hi Jarrod, I ran your FASTA file on my local installation of VirSorter and it ran just fine. However, our out files look different. Can you try going to VirSorter/Scripts and run make in that directory? Based on your out file, it looks like the program Sliding_window_3 is not feeding the appropriate information into the out file.

In my "out" file, I see lines like this:

Gene VIRSorter_contig-100_952_length_2513_read_count_91 / gene_1 -> category 0 -> putative hallmark

interspersed with lines like this:

0 5 1 2.47873542367956 0

The line with only numbers is the output of the C script Sliding_windows_3 which is fed back into VirSorter via stdout. If the numbers aren't appearing in your out file, I suspect you have an issue with Sliding_windows_3 not running correctly.

I tested this with the master version of the repo as well as version e98d2f8f473b3028793b8a037c91648d1453f7a0 as is mentioned in the tutorial on the Anvi'o page, and both worked fine. I'm interested to see what happens!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/simroux/VirSorter/issues/28#issuecomment-386710593, or mute the thread < https://github.com/notifications/unsubscribe-auth/AlDB0pK9g7bGNuZshQMT8tTOR1BWgg1Rks5tvK2AgaJpZM4Tt3n5

.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/simroux/VirSorter/issues/28#issuecomment-386746968, or mute the thread https://github.com/notifications/unsubscribe-auth/APcStvmYzncGzzDcdBgDjRnnV0vWXrQvks5tvNCMgaJpZM4Tt3n5 .

jarrodscott commented 6 years ago

hi @simroux and @brymerr921

would either of you mind sending we a log and err files for a successful virsorter run? I suspect my problems are particular to the server I am using.

Thanks! Jarrod

jarrodscott commented 6 years ago

oops! sorry. found them on CyVerse. Never mind :)

brymerr921 commented 6 years ago

Hi Jarrod,

I've compressed the entire output directory for you so you can see what the out and err files should look like. Here it is. I ran this using the master branch (6631300).

One thing you can try is to run Step_3 on its own. To do this, use the following command:

/path/to/Virsorter/Scripts/Step_3_highlight_phage_signal.pl Metric_files/VIRSorter_affi-contigs.tab VIRSorter_phage_signal_test.csv 16

VIRSorter_affi-contigs.tab is the input file, and VIRSorter_phage_signal_test.csv is the output file, and 16 is the number of CPUs I used. Looking in the output directory (linked above), you can re-run Step_3 and then compare the output file with Metric_files/VIRSorter_phage_signal.tab. If running Step_3 worked, these files should be the same (albeit the lines may be in a different order).

jarrodscott commented 6 years ago

(Please ignore my last message: I was going down the wrong track so deleted my last comment)

Hi Bryan

Thanks for the files and advice. FYI--I was able to run Step_2 and get the same outfile (VIRSorter_affi-contigs.csv) as was in the package you sent. But Step_3 still generates an empty file. I am taking a closer look now :)

jarrodscott commented 6 years ago

Hi Bryan & Simon

could one of you please send me the output of conda list -n virsorter? I want to compare to my conda install and see if maybe something is different...Thanks!

jarrodscott commented 6 years ago

This issue has been solved (at least for me) with latest virsorter updates and specification of diamond version. Can close