Closed vnsriniv closed 6 years ago
Hi Varun,
If you want to see the data in the interface, you could consider only working with clusters containing at last 2 (or even 3) sequences. There is an option when performing the pangenomic analysis called '--min-occurrence'.
Another option is to work with the summary output, without using the interface. For this you can use the program "anvi-summarize".
Maybe this can help.
Best,
Tom
On Mon, Nov 20, 2017 at 4:46 PM, vnsriniv notifications@github.com wrote:
Hi Anvio team
I am performing some comparative genomics analysis (same workflow as #587 https://github.com/merenlab/anvio/issues/587) and having an issue with anvi-display-pan. I used the following code to perform the pangenomic analysis on my genomes (Yeah.. I am using the the pangenomic workflow to do comparative genomics on genomes that are not part of the same species or genus.. I don't know if this is something extremely inappropriate. Forgive me if it is and let me know if there is an alternative....:-\ )
anvi-pan-genome -g MY-GENOMES.h5 -n "CompGenomics_PAOs" -T 5 --overwrite-output-destinations
I got a lot of protein clusters (~59000), so anvio did not want to perform heirarchical clustering on it (understandably).
Then I used the following line of code to display the analysis anvi-display-pan -p CompGenomics_PAOs-PAN.db -s SAMPLES.db -g CompGenomics_PAOs-PAN-GENOMES.h5
and I got this error:
Traceback (most recent call last): File "/home/vnsriniv/virtual-envs/anvio/bin/anvi-display-pan", line 4, in
import('pkg_resources').run_script('anvio===3-master', 'anvi-display-pan') File "/home/vnsriniv/virtual-envs/anvio/lib/python3.5/site-packages/pkg_resources/init.py", line 748, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/vnsriniv/virtual-envs/anvio/lib/python3.5/site-packages/pkg_resources/init.py", line 1517, in run_script exec(code, namespace, namespace) File "/home/vnsriniv/virtual-envs/anvio/lib/python3.5/site-packages/anvio-3_master-py3.5-linux-x86_64.egg/EGG-INFO/scripts/anvi-display-pan", line 75, in d = interactive.Interactive(args) File "/home/vnsriniv/virtual-envs/anvio/lib/python3.5/site-packages/anvio-3_master-py3.5-linux-x86_64.egg/anvio/interactive.py", line 155, in init self.load_pan_mode() File "/home/vnsriniv/virtual-envs/anvio/lib/python3.5/site-packages/anvio-3_master-py3.5-linux-x86_64.egg/anvio/interactive.py", line 744, in load_pan_mode PanSuperclass.init(self, self.args) File "/home/vnsriniv/virtual-envs/anvio/lib/python3.5/site-packages/anvio-3_master-py3.5-linux-x86_64.egg/anvio/dbops.py", line 784, in init if self.p_meta['PCs_ordered']: KeyError: 'PCs_ordered' I am guessing this is related to the lack of heirarchical clustering.
1.
If it is related, how can I minimize the no. of protein clusters or get around this issue? I thought about split length parameter in anvi-gen-contigs-database but realized that the number of protein clusters might be more related to the no. of CDSs than to the split length. 2.
If it is not related, what might be the issue?
Thanks Varun
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/merenlab/anvio/issues/648, or mute the thread https://github.com/notifications/unsubscribe-auth/AH8mWVpPAYxE48uRganO36i8u-FlKu_Nks5s4Z7CgaJpZM4Qkdgr .
Hi Varun,
Which anvi'o version are you using? Even when there is no hierarchical clustering, you shouldn't get that error.
Best,
@tdelmont Thanks will try that out!
@meren I am using the master version from github.
oh. this is bad. thank you for reporting, Varun. I will look into this.
@meren The error seems to have disappeared when I followed Tom's suggestion and used a --min-occurrence of 2.
If you haven't deleted the old pan (without --min-occurrence 2
) can you please test the updated master to see whether that previous error is now replaced with a more informative output message? :)
Works great!! The error is very informative! Thanks :)
thank you very much for your help!
Hi Anvio team
I am performing some comparative genomics analysis (same workflow as #587) and having an issue with anvi-display-pan. I used the following code to perform the pangenomic analysis on my genomes (Yeah.. I am using the the pangenomic workflow to do comparative genomics on genomes that are not part of the same species or genus.. I don't know if this is something extremely inappropriate. Forgive me if it is and let me know if there is an alternative....:-\ )
anvi-pan-genome -g MY-GENOMES.h5 -n "CompGenomics_PAOs" -T 5 --overwrite-output-destinations
I got a lot of protein clusters (~59000), so anvio did not want to perform heirarchical clustering on it (understandably).
Then I used the following line of code to display the analysis
anvi-display-pan -p CompGenomics_PAOs-PAN.db -s SAMPLES.db -g CompGenomics_PAOs-PAN-GENOMES.h5
and I got this error:
I am guessing this is related to the lack of heirarchical clustering.
If it is related, how can I minimize the no. of protein clusters or get around this issue? I thought about split length parameter in
anvi-gen-contigs-database
but realized that the number of protein clusters might be more related to the no. of CDSs than to the split length.If it is not related, what might be the issue?
Thanks Varun