Closed jbird9 closed 7 years ago
Hi Jordan,
This is very interesting. I have been using that flag quite often recently and I didn't really run into any issues with it. I am curious whether this could be something specific to your data. I would be happy to take a look if you were to share it with me, or find a way to help me replicate the error :)
Best,
Although it will take some time for me to get back to it as I am traveling quite extensively :(
Hi Meren,
I hope your travels are going well. I have attached a link to the SAG fastas I have been working with. I had two external genome files one of them has just the bacteria while the other includes two tiny archaeal genomes. I understand traveling can suck the time away so I am not expecting a quick solution.
Thanks for your help,
Jordan Bird
(Meren's note: I edited the content to remove the attachement)
Hi Jordan,
I downloaded them to have them with me to investigate this whenever I can find some time.
(Meanwhile I removed the link from your message so the data link is not archived).
Thank you!
Hi Jordan,
I run the pangenomic analysis using the files you sent with the parameter --min-occurrence 2
, and it seems everything worked nicely and produced the following output:
meren ~/Downloads/SAG_FASTAs $ anvi-pan-genome -e external_genomes.csv -o pan --num-threads 6 --min-occurrence 2
WARNING
===============================================
If you publish results from this workflow, please do not forget to cite DIAMOND
(doi:10.1038/nmeth.3176), unless you use it with --use-ncbi-blast flag, and MCL
(http://micans.org/mcl/ and doi:10.1007/978-1-61779-361-5_15)
External genomes .............................: 45 have been initialized.
Internal genomes .............................: 0 have been initialized.
Exclude partial gene calls ...................: False
* JS1_60B_A09 is initialized with 1,499 genes (0 were excluded)
* JS1_59E_13H_C14 is initialized with 542 genes (0 were excluded)
* NT-B2-AD-617-P19 is initialized with 1,168 genes (0 were excluded)
* JS1_60B_N06 is initialized with 2,875 genes (0 were excluded)
* OPB41_60B_13H_C09 is initialized with 1,067 genes (0 were excluded)
* JS1_60B_M10 is initialized with 1,880 genes (0 were excluded)
* OPB41_60B_13H_B07 is initialized with 963 genes (0 were excluded)
* JS1_59E_13H_F07 is initialized with 640 genes (0 were excluded)
* MG2 is initialized with 383 genes (0 were excluded)
* OPB41_59E_21H_M23 is initialized with 555 genes (0 were excluded)
* JS1_59E_13H_O21 is initialized with 922 genes (0 were excluded)
* JS1_60B_I07 is initialized with 862 genes (0 were excluded)
* OP8_59E_13H_E21 is initialized with 524 genes (0 were excluded)
* Chl_60B_28H_A21 is initialized with 910 genes (0 were excluded)
* OP8_59E_13H_M21 is initialized with 599 genes (0 were excluded)
* D-anilini-AD-619-D02 is initialized with 696 genes (0 were excluded)
* Chl_60B_28H_C14 is initialized with 1,067 genes (0 were excluded)
* OPB41_59E_21H_O21 is initialized with 934 genes (0 were excluded)
* JS1_60B_E13 is initialized with 1,398 genes (0 were excluded)
* JS1_59E_13H_L23 is initialized with 588 genes (0 were excluded)
* OPB41-AD-617-I09 is initialized with 672 genes (0 were excluded)
* OP8-AD-619-P22 is initialized with 488 genes (0 were excluded)
* OP8-AD-617-C16 is initialized with 1,045 genes (0 were excluded)
* JS1_59E_13H_K04 is initialized with 1,520 genes (0 were excluded)
* Unk_60B_28H_C08 is initialized with 563 genes (0 were excluded)
* OP8_59E_13H_F13 is initialized with 1,005 genes (0 were excluded)
* Chl_60B_13H_A19 is initialized with 636 genes (0 were excluded)
* Chloroflexi-AD-619-B06 is initialized with 37 genes (0 were excluded)
* JS1_59E_13H_E20 is initialized with 861 genes (0 were excluded)
* OPB41-AD-617-M19 is initialized with 1,015 genes (0 were excluded)
* Chloroflexi-AD-619-N02 is initialized with 429 genes (0 were excluded)
* NT-B2-AD-619-E05 is initialized with 1,517 genes (0 were excluded)
* JS1_60B_M21 is initialized with 1,479 genes (0 were excluded)
* Chloroflexi-AD-619-G11 is initialized with 109 genes (0 were excluded)
* OPB41_59E_21H_M06 is initialized with 132 genes (0 were excluded)
* OPB41_60B_13H_O22 is initialized with 665 genes (0 were excluded)
* NT-B2-AD-619-P03 is initialized with 522 genes (0 were excluded)
* first_spades_MCG is initialized with 676 genes (0 were excluded)
* JS1_59E_13H_E15 is initialized with 309 genes (0 were excluded)
* OPB41_60B_13H_A10 is initialized with 1,054 genes (0 were excluded)
* JS1_59E_13H_L14 is initialized with 238 genes (0 were excluded)
* OPB41_59E_21H_B05 is initialized with 611 genes (0 were excluded)
* JS1_60B_D03 is initialized with 1,785 genes (0 were excluded)
* D-anilini-AD-619-E09 is initialized with 1,040 genes (0 were excluded)
* OP8_59E_13H_M19 is initialized with 676 genes (0 were excluded)
Num protein sequences ........................: 39,156
Num excluded gene calls ......................: 0
Num unique protein sequences .................: 32,816
Combined protein sequences FASTA .............: /Users/meren/Downloads/SAG_FASTAs/pan/combined-proteins.fa
Unique protein sequences FASTA ...............: /Users/meren/Downloads/SAG_FASTAs/pan/combined-proteins.fa.unique
WARNING
===============================================
Notice: A diamond database is found in the output directory, and will be used!
WARNING
===============================================
Notice: A DIAMOND search result is found in the output directory: skipping
BLASTP!
WARNING
===============================================
Notice: A DIAMOND tabular output is found in the output directory. Anvi'o will
not generate another one!
Min percent identity .........................: 0.0
Maxbit .......................................: 0.5
Filtered search results ......................: 308,113 edges stored
MCL input ....................................: /Users/meren/Downloads/SAG_FASTAs/pan/mcl-input.txt
MCL inflation ................................: 2.0
MCL output ...................................: /Users/meren/Downloads/SAG_FASTAs/pan/mcl-clusters.txt
Number of protein clusters ...................: 14,941
protein clusters info ........................: /Users/meren/Downloads/SAG_FASTAs/pan/protein-clusters.txt
PCs min occurrence ...........................: 2 (the filter removed 9032 PCs)
Anvi'o view data for protein clusters ........: /Users/meren/Downloads/SAG_FASTAs/pan/anvio-view-data.txt
Anvi'o additional view data ..................: /Users/meren/Downloads/SAG_FASTAs/pan/anvio-additional-view-data.txt
Anvi'o samples information ...................: /Users/meren/Downloads/SAG_FASTAs/pan/anvio-samples-information.txt
WARNING
===============================================
filesnpaths::gen_output_directory: the client asked the existing directory
"/Users/meren/Downloads/SAG_FASTAs/pan/pan" to be removed.. Just so you know :/
(You have 5 seconds to press CTRL + C).
Tree .........................................: /Users/meren/Downloads/SAG_FASTAs/pan/pan/tree.txt
Anvi'o samples order .........................: /Users/meren/Downloads/SAG_FASTAs/pan/pan/anvio-samples-order.txt
Ad hoc anvi'o run files ......................: /Users/meren/Downloads/SAG_FASTAs/pan/pan
log file .....................................: /Users/meren/Downloads/SAG_FASTAs/pan/log.txt
meren ~/Downloads/SAG_FASTAs $ cd pan/pan
meren ~/Downloads/SAG_FASTAs/pan/pan $ anvi-interactive -p profile.db -s samples.db -t tree.txt -d view_data.txt -A additional_view_data.txt --manual
I have a feeling that if you remove the previous output directory completely, and re-run the same command everything will run this time. Can you please try?
Thank you,
I have now attempted this on two separate computers and I am getting the same error with linkage Z. Perhaps, this is distribution specific bug.
On Fri, Sep 2, 2016 at 7:31 PM, A. Murat Eren notifications@github.com wrote:
Closed #395 https://github.com/meren/anvio/issues/395.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/meren/anvio/issues/395#event-777201564, or mute the thread https://github.com/notifications/unsubscribe-auth/ADa5FM97P0uvQZcichIe05H7UsXPOUc0ks5qmLHbgaJpZM4JxBcs .
Jordan Bird Ph D Student at The University of Tennessee jbird9@utk.edu or jordantobybird@gmail.com 870-718-9053
Hi Joradan,
What is your version of scipy? Here is mine:
meren ~ $ python -c 'import scipy; print scipy.__version__'
0.17.1
Thanks,
Mine is 0.18.0
On Sep 4, 2016 2:24 PM, "A. Murat Eren" notifications@github.com wrote:
Hi Joradan,
What is your version of scipy? Here is mine:
meren ~ $ python -c 'import scipy; print scipy.version' 0.17.1
Thanks,
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/meren/anvio/issues/395#issuecomment-244620388, or mute the thread https://github.com/notifications/unsubscribe-auth/ADa5FGQkqwv6TurjY7Yb7mHIrhcRtAMfks5qmwzqgaJpZM4JxBcs .
Crap. I see 0.18.0 is only 2 weeks old. I wonder if this is something about their new release.
If you can downgrade your version to 0.17.1 it probably will fix the problem. On the other hand I will look into this as soon as possible.
Sorry about this.
Hi Meren,
Just to let you know, I installed scipy 0.17.1 and reran the analysis and it ran through without a hitch. Clearly, there was a change in scipy 0.18.0 that is causing the bug.
Thanks,
Jordan
On Sun, Sep 4, 2016 at 3:00 PM, A. Murat Eren notifications@github.com wrote:
Reopened #395 https://github.com/meren/anvio/issues/395.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/meren/anvio/issues/395#event-777718052, or mute the thread https://github.com/notifications/unsubscribe-auth/ADa5FMak2OStPR3xlCVMZWJfruA3H80-ks5qmxU9gaJpZM4JxBcs .
Jordan Bird Ph D Student at The University of Tennessee jbird9@utk.edu or jordantobybird@gmail.com 870-718-9053
Thank you for looking into this. Upgrades that brake things that worked perfectly indeed have a special place in hell.
Hi Meren,
I was attempting to use anvi-pan-genome for a set of external genomes. With the default --min-occurrence 1 and using --min-occurrence 3 the program seems to run perfectly. However, when I tried to remove the just the singletons using --min-occurrence 2 I got the following error:
The dataset includes a number of fragmented SAGs and anvi-pan-genome -v yields:
Thanks,
Jordan