merenlab / anvio

An analysis and visualization platform for 'omics data
http://merenlab.org/software/anvio
GNU General Public License v3.0
436 stars 144 forks source link

Error in anvi-display-pan #648

Closed vnsriniv closed 6 years ago

vnsriniv commented 6 years ago

Hi Anvio team

I am performing some comparative genomics analysis (same workflow as #587) and having an issue with anvi-display-pan. I used the following code to perform the pangenomic analysis on my genomes (Yeah.. I am using the the pangenomic workflow to do comparative genomics on genomes that are not part of the same species or genus.. I don't know if this is something extremely inappropriate. Forgive me if it is and let me know if there is an alternative....:-\ )

anvi-pan-genome -g MY-GENOMES.h5 -n "CompGenomics_PAOs" -T 5 --overwrite-output-destinations

I got a lot of protein clusters (~59000), so anvio did not want to perform heirarchical clustering on it (understandably).

Then I used the following line of code to display the analysis anvi-display-pan -p CompGenomics_PAOs-PAN.db -s SAMPLES.db -g CompGenomics_PAOs-PAN-GENOMES.h5

and I got this error:

Traceback (most recent call last):
  File "/home/vnsriniv/virtual-envs/anvio/bin/anvi-display-pan", line 4, in <module>
    __import__('pkg_resources').run_script('anvio===3-master', 'anvi-display-pan')
  File "/home/vnsriniv/virtual-envs/anvio/lib/python3.5/site-packages/pkg_resources/__init__.py", line 748, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/vnsriniv/virtual-envs/anvio/lib/python3.5/site-packages/pkg_resources/__init__.py", line 1517, in run_script
    exec(code, namespace, namespace)
  File "/home/vnsriniv/virtual-envs/anvio/lib/python3.5/site-packages/anvio-3_master-py3.5-linux-x86_64.egg/EGG-INFO/scripts/anvi-display-pan", line 75, in <module>
    d = interactive.Interactive(args)
  File "/home/vnsriniv/virtual-envs/anvio/lib/python3.5/site-packages/anvio-3_master-py3.5-linux-x86_64.egg/anvio/interactive.py", line 155, in __init__
    self.load_pan_mode()
  File "/home/vnsriniv/virtual-envs/anvio/lib/python3.5/site-packages/anvio-3_master-py3.5-linux-x86_64.egg/anvio/interactive.py", line 744, in load_pan_mode
    PanSuperclass.__init__(self, self.args)
  File "/home/vnsriniv/virtual-envs/anvio/lib/python3.5/site-packages/anvio-3_master-py3.5-linux-x86_64.egg/anvio/dbops.py", line 784, in __init__
    if self.p_meta['PCs_ordered']:
KeyError: 'PCs_ordered'

I am guessing this is related to the lack of heirarchical clustering.

  1. If it is related, how can I minimize the no. of protein clusters or get around this issue? I thought about split length parameter in anvi-gen-contigs-database but realized that the number of protein clusters might be more related to the no. of CDSs than to the split length.

  2. If it is not related, what might be the issue?

Thanks Varun

tdelmont commented 6 years ago

Hi Varun,

If you want to see the data in the interface, you could consider only working with clusters containing at last 2 (or even 3) sequences. There is an option when performing the pangenomic analysis called '--min-occurrence'.

Another option is to work with the summary output, without using the interface. For this you can use the program "anvi-summarize".

Maybe this can help.

Best,

Tom

On Mon, Nov 20, 2017 at 4:46 PM, vnsriniv notifications@github.com wrote:

Hi Anvio team

I am performing some comparative genomics analysis (same workflow as #587 https://github.com/merenlab/anvio/issues/587) and having an issue with anvi-display-pan. I used the following code to perform the pangenomic analysis on my genomes (Yeah.. I am using the the pangenomic workflow to do comparative genomics on genomes that are not part of the same species or genus.. I don't know if this is something extremely inappropriate. Forgive me if it is and let me know if there is an alternative....:-\ )

anvi-pan-genome -g MY-GENOMES.h5 -n "CompGenomics_PAOs" -T 5 --overwrite-output-destinations

I got a lot of protein clusters (~59000), so anvio did not want to perform heirarchical clustering on it (understandably).

Then I used the following line of code to display the analysis anvi-display-pan -p CompGenomics_PAOs-PAN.db -s SAMPLES.db -g CompGenomics_PAOs-PAN-GENOMES.h5

and I got this error:

Traceback (most recent call last): File "/home/vnsriniv/virtual-envs/anvio/bin/anvi-display-pan", line 4, in import('pkg_resources').run_script('anvio===3-master', 'anvi-display-pan') File "/home/vnsriniv/virtual-envs/anvio/lib/python3.5/site-packages/pkg_resources/init.py", line 748, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/vnsriniv/virtual-envs/anvio/lib/python3.5/site-packages/pkg_resources/init.py", line 1517, in run_script exec(code, namespace, namespace) File "/home/vnsriniv/virtual-envs/anvio/lib/python3.5/site-packages/anvio-3_master-py3.5-linux-x86_64.egg/EGG-INFO/scripts/anvi-display-pan", line 75, in d = interactive.Interactive(args) File "/home/vnsriniv/virtual-envs/anvio/lib/python3.5/site-packages/anvio-3_master-py3.5-linux-x86_64.egg/anvio/interactive.py", line 155, in init self.load_pan_mode() File "/home/vnsriniv/virtual-envs/anvio/lib/python3.5/site-packages/anvio-3_master-py3.5-linux-x86_64.egg/anvio/interactive.py", line 744, in load_pan_mode PanSuperclass.init(self, self.args) File "/home/vnsriniv/virtual-envs/anvio/lib/python3.5/site-packages/anvio-3_master-py3.5-linux-x86_64.egg/anvio/dbops.py", line 784, in init if self.p_meta['PCs_ordered']: KeyError: 'PCs_ordered'

I am guessing this is related to the lack of heirarchical clustering.

1.

If it is related, how can I minimize the no. of protein clusters or get around this issue? I thought about split length parameter in anvi-gen-contigs-database but realized that the number of protein clusters might be more related to the no. of CDSs than to the split length. 2.

If it is not related, what might be the issue?

Thanks Varun

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/merenlab/anvio/issues/648, or mute the thread https://github.com/notifications/unsubscribe-auth/AH8mWVpPAYxE48uRganO36i8u-FlKu_Nks5s4Z7CgaJpZM4Qkdgr .

meren commented 6 years ago

Hi Varun,

Which anvi'o version are you using? Even when there is no hierarchical clustering, you shouldn't get that error.

Best,

vnsriniv commented 6 years ago

@tdelmont Thanks will try that out!

@meren I am using the master version from github.

meren commented 6 years ago

oh. this is bad. thank you for reporting, Varun. I will look into this.

vnsriniv commented 6 years ago

@meren The error seems to have disappeared when I followed Tom's suggestion and used a --min-occurrence of 2.

meren commented 6 years ago

If you haven't deleted the old pan (without --min-occurrence 2) can you please test the updated master to see whether that previous error is now replaced with a more informative output message? :)

vnsriniv commented 6 years ago

Works great!! The error is very informative! Thanks :)

meren commented 6 years ago

thank you very much for your help!