merenlab / anvio

An analysis and visualization platform for 'omics data

GNU General Public License v3.0

413 stars 142 forks source link

Error at mini test : taxonomy #342

Closed liuxianghui closed 8 years ago

liuxianghui commented 8 years ago

Dear Meren: I have problems at anvio test. Since my mac book got only 8G memory. I tried to install anvio both at Macbook and Linux cluster ( centos. readhat). I am hoping that I can try the clustering part them but visualise it in Mac. The installation seems ok. Mac got no problem till I saw the chrome window come out ( linux shows to 'clustering of 'tnf-cov' has been requested). However, I failed at my linux cluster at the following step. What could be the reason? Moreover, when I run 'svr_assign_to_dna_using_figfams' command, both mac and linux cluster shows nothing output. Please kindly help. Xianghui

(cgat-python)[xianghui@merlion tests]$ ./run_mini_test.sh

Creating the output directory ...

Anvo'o version ...

Anvi'o version ...............................: 1.2.2 Contigs DB version ...........................: 3 Profile DB version ...........................: 6 Samples information DB version ...............: 2 Auxiliary HDF5 DB version ....................: 1

Initializing raw BAM files ...

Sorted BAM File ..............................: /data/xianghui/anvio/tests/sandbox/test-output/204-6M
Indexed BAM File .............................: /data/xianghui/anvio/tests/sandbox/test-output/204-6M.bam

Sorted BAM File ..............................: /data/xianghui/anvio/tests/sandbox/test-output/204-7M
Indexed BAM File .............................: /data/xianghui/anvio/tests/sandbox/test-output/204-7M.bam

Sorted BAM File ..............................: /data/xianghui/anvio/tests/sandbox/test-output/204-9M
Indexed BAM File .............................: /data/xianghui/anvio/tests/sandbox/test-output/204-9M.bam

Generating an EMPTY contigs database ...

Contigs database .............................: A new database, test-output/CONTIGS.db, has been created. Number of contigs ............................: 6 Number of splits .............................: 60 Total number of nucleotides ..................: 57,030 Split length .................................: 1,000

Populating taxonomy for splits table in the database using 'myrast_cmdline' parser ...

Traceback (most recent call last): File "/data/xianghui/anvio/bin/anvi-import-taxonomy-from-gene-annotations", line 88, in regarding parsers.' % (len(parser_modules['taxonomy']), parser_modules['taxonomy'].keys())) KeyError: 'taxonomy'

meren commented 8 years ago

Hi Xianghui,

Your installation seems to be broken. Your anvi'o version says you are running 1.2.2, but in your bins directory there is an anvi-import-taxonomy-from-gene-annotations, which shouldn't be in 1.2.2 version.

I suggest you remove everything, and re-install anvi'o by following directions in the installation manual word by word.

svr_assign_to_dna_using_figfams is a myRAST program, installation and usage of which is also explained in the tutorial. If the example on the tutorial is not working, it is likely there is a problem with the RAST server, and it could be temporary.

Best wishes,

A. Murat Eren (meren) http://merenlab.org :: gpg https://keybase.io/meren

On Tue, Mar 15, 2016 at 10:43 PM, liuxianghui notifications@github.com wrote:

Dear Meren: I have problems at anvio test. Since my mac book got only 8G memory. I tried to install anvio both at Macbook and Linux cluster ( centos. readhat). I am hoping that I can try the clustering part them but visualise it in Mac. The installation seems ok. Mac got no problem till I saw the chrome window come out ( linux shows to 'clustering of 'tnf-cov' has been requested). However, I failed at my linux cluster at the following step. What could be the reason? Moreover, when I run 'svr_assign_to_dna_using_figfams' command, both mac and linux cluster shows nothing output. Please kindly help. Xianghui

(cgat-python)[xianghui@merlion tests]$ ./run_mini_test.sh

# # Creating the output directory ...

# #

# # Anvo'o version ...

# #

Anvi'o version ...............................: 1.2.2 Contigs DB version ...........................: 3 Profile DB version ...........................: 6 Samples information DB version ...............: 2 Auxiliary HDF5 DB version ....................: 1

# # Initializing raw BAM files ...

# #

Sorted BAM File ..............................: /data/xianghui/anvio/tests/sandbox/test-output/204-6M

Indexed BAM File .............................: /data/xianghui/anvio/tests/sandbox/test-output/204-6M.bam

Sorted BAM File ..............................: /data/xianghui/anvio/tests/sandbox/test-output/204-7M

Indexed BAM File .............................: /data/xianghui/anvio/tests/sandbox/test-output/204-7M.bam

Sorted BAM File ..............................: /data/xianghui/anvio/tests/sandbox/test-output/204-9M

Indexed BAM File .............................: /data/xianghui/anvio/tests/sandbox/test-output/204-9M.bam

# # Generating an EMPTY contigs database ...

# #

Contigs database .............................: A new database, test-output/CONTIGS.db, has been created. Number of contigs ............................: 6 Number of splits .............................: 60 Total number of nucleotides ..................: 57,030 Split length .................................: 1,000

# # Populating taxonomy for splits table in the database using 'myrast_cmdline' parser ...

# #

Traceback (most recent call last): File "/data/xianghui/anvio/bin/anvi-import-taxonomy-from-gene-annotations", line 88, in regarding parsers.' % (len(parser_modules['taxonomy']), parser_modules['taxonomy'].keys())) KeyError: 'taxonomy'

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/meren/anvio/issues/342

liuxianghui commented 8 years ago

I removed all anvio and reinstalled using pip. The anvio-profile works but not mini test or mini_test. Then I try to git clone anvio.git. However when run run_mini_test.sh. It said anvio-import-taxonomy-from-gene-annotations command not found. Please kindly suggest. Sent from my iPhone

On 16 Mar 2016, at 11:52 am, A. Murat Eren notifications@github.com wrote:

Hi Xianghui,

Your installation seems to be broken. Your anvi'o version says you are running 1.2.2, but in your bins directory there is an anvi-import-taxonomy-from-gene-annotations, which shouldn't be in 1.2.2 version.

I suggest you remove everything, and re-install anvi'o by following directions in the installation manual word by word.

svr_assign_to_dna_using_figfams is a myRAST program, installation and usage of which is also explained in the tutorial. If the example on the tutorial is not working, it is likely there is a problem with the RAST server, and it could be temporary.

Best wishes,

A. Murat Eren (meren) http://merenlab.org :: gpg https://keybase.io/meren

On Tue, Mar 15, 2016 at 10:43 PM, liuxianghui notifications@github.com wrote:

Dear Meren: I have problems at anvio test. Since my mac book got only 8G memory. I tried to install anvio both at Macbook and Linux cluster ( centos. readhat). I am hoping that I can try the clustering part them but visualise it in Mac. The installation seems ok. Mac got no problem till I saw the chrome window come out ( linux shows to 'clustering of 'tnf-cov' has been requested). However, I failed at my linux cluster at the following step. What could be the reason? Moreover, when I run 'svr_assign_to_dna_using_figfams' command, both mac and linux cluster shows nothing output. Please kindly help. Xianghui

(cgat-python)[xianghui@merlion tests]$ ./run_mini_test.sh

# # Creating the output directory ...

# #

# # Anvo'o version ...

# #

Anvi'o version ...............................: 1.2.2 Contigs DB version ...........................: 3 Profile DB version ...........................: 6 Samples information DB version ...............: 2 Auxiliary HDF5 DB version ....................: 1

# # Initializing raw BAM files ...

# #

Sorted BAM File ..............................: /data/xianghui/anvio/tests/sandbox/test-output/204-6M

Indexed BAM File .............................: /data/xianghui/anvio/tests/sandbox/test-output/204-6M.bam

Sorted BAM File ..............................: /data/xianghui/anvio/tests/sandbox/test-output/204-7M

Indexed BAM File .............................: /data/xianghui/anvio/tests/sandbox/test-output/204-7M.bam

Sorted BAM File ..............................: /data/xianghui/anvio/tests/sandbox/test-output/204-9M

Indexed BAM File .............................: /data/xianghui/anvio/tests/sandbox/test-output/204-9M.bam

# # Generating an EMPTY contigs database ...

# #

Contigs database .............................: A new database, test-output/CONTIGS.db, has been created. Number of contigs ............................: 6 Number of splits .............................: 60 Total number of nucleotides ..................: 57,030 Split length .................................: 1,000

# # Populating taxonomy for splits table in the database using 'myrast_cmdline' parser ...

# #

Traceback (most recent call last): File "/data/xianghui/anvio/bin/anvi-import-taxonomy-from-gene-annotations", line 88, in regarding parsers.' % (len(parser_modules['taxonomy']), parser_modules['taxonomy'].keys())) KeyError: 'taxonomy'

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/meren/anvio/issues/342

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub

liuxianghui commented 8 years ago

I realised that the problem is my anvio is 1.22. I tried to uninstall from pip and reinstall but got this.

(cgat-python)[xianghui@merlion aaa]$ pip install anvio

Collecting anvio

/data/xianghui/cgat-python/lib/python2.7/site-packages/pip/vendor/requests/packages/urllib3/util/ssl.py:315: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#snimissingwarning.

SNIMissingWarning

/data/xianghui/cgat-python/lib/python2.7/site-packages/pip/vendor/requests/packages/urllib3/util/ssl.py:120: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning .

InsecurePlatformWarning

Using cached anvio-1.2.2.tar.bz2

^COperation cancelled by user

On Wed, Mar 16, 2016 at 11:52 AM, A. Murat Eren notifications@github.com wrote:

Hi Xianghui,

Your installation seems to be broken. Your anvi'o version says you are running 1.2.2, but in your bins directory there is an anvi-import-taxonomy-from-gene-annotations, which shouldn't be in 1.2.2 version.

I suggest you remove everything, and re-install anvi'o by following directions in the installation manual word by word.

svr_assign_to_dna_using_figfams is a myRAST program, installation and usage of which is also explained in the tutorial. If the example on the tutorial is not working, it is likely there is a problem with the RAST server, and it could be temporary.

Best wishes,

A. Murat Eren (meren) http://merenlab.org :: gpg https://keybase.io/meren

On Tue, Mar 15, 2016 at 10:43 PM, liuxianghui notifications@github.com wrote:

Dear Meren: I have problems at anvio test. Since my mac book got only 8G memory. I tried to install anvio both at Macbook and Linux cluster ( centos. readhat). I am hoping that I can try the clustering part them but visualise it in Mac. The installation seems ok. Mac got no problem till I saw the chrome window come out ( linux shows to 'clustering of 'tnf-cov' has been requested). However, I failed at my linux cluster at the following step. What could be the reason? Moreover, when I run 'svr_assign_to_dna_using_figfams' command, both mac and linux cluster shows nothing output. Please kindly help. Xianghui

(cgat-python)[xianghui@merlion tests]$ ./run_mini_test.sh

# # Creating the output directory ...

# #

# # Anvo'o version ...

# #

Anvi'o version ...............................: 1.2.2 Contigs DB version ...........................: 3 Profile DB version ...........................: 6 Samples information DB version ...............: 2 Auxiliary HDF5 DB version ....................: 1

# # Initializing raw BAM files ...

# #

Sorted BAM File ..............................: /data/xianghui/anvio/tests/sandbox/test-output/204-6M

Indexed BAM File .............................: /data/xianghui/anvio/tests/sandbox/test-output/204-6M.bam

Sorted BAM File ..............................: /data/xianghui/anvio/tests/sandbox/test-output/204-7M

Indexed BAM File .............................: /data/xianghui/anvio/tests/sandbox/test-output/204-7M.bam

Sorted BAM File ..............................: /data/xianghui/anvio/tests/sandbox/test-output/204-9M

Indexed BAM File .............................: /data/xianghui/anvio/tests/sandbox/test-output/204-9M.bam

# # Generating an EMPTY contigs database ...

# #

Contigs database .............................: A new database, test-output/CONTIGS.db, has been created. Number of contigs ............................: 6 Number of splits .............................: 60 Total number of nucleotides ..................: 57,030 Split length .................................: 1,000

# # Populating taxonomy for splits table in the database using 'myrast_cmdline' parser ...

# #

Traceback (most recent call last): File "/data/xianghui/anvio/bin/anvi-import-taxonomy-from-gene-annotations", line 88, in regarding parsers.' % (len(parser_modules['taxonomy']), parser_modules['taxonomy'].keys())) KeyError: 'taxonomy'

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/meren/anvio/issues/342

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/meren/anvio/issues/342#issuecomment-197140628

meren commented 8 years ago

Hi,

When I said "following directions in the installation manual word by word", I had really meant it :) You will find nowhere in the installation manual a suggestion to run git clone anvio to run mini test.

Here:

http://merenlab.org/2015/05/01/installation/#running-the-mini-test

A. Murat Eren (meren) http://merenlab.org :: gpg https://keybase.io/meren

On Wed, Mar 16, 2016 at 1:07 AM, liuxianghui notifications@github.com wrote:

I removed all anvio and reinstalled using pip. The anvio-profile works but not mini test or mini_test. Then I try to git clone anvio.git. However when run run_mini_test.sh. It said anvio-import-taxonomy-from-gene-annotations command not found. Please kindly suggest. Sent from my iPhone

On 16 Mar 2016, at 11:52 am, A. Murat Eren notifications@github.com wrote:

Hi Xianghui,

Your installation seems to be broken. Your anvi'o version says you are running 1.2.2, but in your bins directory there is an anvi-import-taxonomy-from-gene-annotations, which shouldn't be in 1.2.2 version.

I suggest you remove everything, and re-install anvi'o by following directions in the installation manual word by word.

svr_assign_to_dna_using_figfams is a myRAST program, installation and usage of which is also explained in the tutorial. If the example on the tutorial is not working, it is likely there is a problem with the RAST server, and it could be temporary.

Best wishes,

A. Murat Eren (meren) http://merenlab.org :: gpg https://keybase.io/meren

On Tue, Mar 15, 2016 at 10:43 PM, liuxianghui notifications@github.com wrote:

Dear Meren: I have problems at anvio test. Since my mac book got only 8G memory. I tried to install anvio both at Macbook and Linux cluster ( centos. readhat). I am hoping that I can try the clustering part them but visualise it in Mac. The installation seems ok. Mac got no problem till I saw the chrome window come out ( linux shows to 'clustering of 'tnf-cov' has been requested). However, I failed at my linux cluster at the following step. What could be the reason? Moreover, when I run 'svr_assign_to_dna_using_figfams' command, both mac and linux cluster shows nothing output. Please kindly help. Xianghui

(cgat-python)[xianghui@merlion tests]$ ./run_mini_test.sh

# # Creating the output directory ...

# #

# # Anvo'o version ...

# #

Anvi'o version ...............................: 1.2.2 Contigs DB version ...........................: 3 Profile DB version ...........................: 6 Samples information DB version ...............: 2 Auxiliary HDF5 DB version ....................: 1

# # Initializing raw BAM files ...

# #

Sorted BAM File ..............................: /data/xianghui/anvio/tests/sandbox/test-output/204-6M

Indexed BAM File .............................: /data/xianghui/anvio/tests/sandbox/test-output/204-6M.bam

Sorted BAM File ..............................: /data/xianghui/anvio/tests/sandbox/test-output/204-7M

Indexed BAM File .............................: /data/xianghui/anvio/tests/sandbox/test-output/204-7M.bam

Sorted BAM File ..............................: /data/xianghui/anvio/tests/sandbox/test-output/204-9M

Indexed BAM File .............................: /data/xianghui/anvio/tests/sandbox/test-output/204-9M.bam

# # Generating an EMPTY contigs database ...

# #

Contigs database .............................: A new database, test-output/CONTIGS.db, has been created. Number of contigs ............................: 6 Number of splits .............................: 60 Total number of nucleotides ..................: 57,030 Split length .................................: 1,000

# # Populating taxonomy for splits table in the database using 'myrast_cmdline' parser ...

# #

Traceback (most recent call last): File "/data/xianghui/anvio/bin/anvi-import-taxonomy-from-gene-annotations", line 88, in regarding parsers.' % (len(parser_modules['taxonomy']), parser_modules['taxonomy'].keys())) KeyError: 'taxonomy'

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/meren/anvio/issues/342

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/meren/anvio/issues/342#issuecomment-197167428

liuxianghui commented 8 years ago

Thank you. Managed to solve the problem already. I am playing with anvio now. I like it very much. I also like you posts. Very interesting and insightful. I got some new questions,hope to listen to your suggestions.

Clustering How do you normalize your tnf and cov and put them together for clustering. Did you follow the same way as what Concoct does? Especially for those with coverage equal to 0.
Split and marker gene Sometimes split can break 1 orf into two pieces which will certainly overestimate the level of contamination. How could we avoid it?
Refine When you do refine based on a special bin , e.g., group 6, in concoct which has a high level contamination rate. Is it still clustering by tnf + cov or just based on only tnf since cov is biased to one sample.
Concoct Is the clustering of concoct is already integrated into your Anvio? Do I need to run a separate run of concoct on Linux clusters and import the data into anvio and view it in my Mac.
Mapping to a reference genome I wish for the contigs in a bin, is it possible to run a mapping to a reference genome and order the contigs in that bin. This will give me much much more confidence in the contigs. Theoretically it might be easy to just do a blastn. However, I have concerns on that. I tried before to map one draft genome with hundreds of contigs to a reference genome. I saw that each contig got several pieces mapped to the reference genome other than the expected result that each contig matches to a part in reference genome. Besides I have concerns in this practice of matching nucleotide. Will matching the orfs to reference genome be better as protein may be more conserved?
Misassembly Can anvio help to detect the misassembly of contig? Sorry, so many questions. These are something I have been thinking about in my work. I am a bioinformatician in a research institute SCELSE in Singapore doing waste water research. Looking forward to hearing from you. Regards, Xianghui

Sent from my iPhone

On 16 Mar 2016, at 10:10 pm, A. Murat Eren notifications@github.com wrote:

Hi,

When I said "following directions in the installation manual word by word", I had really meant it :) You will find nowhere in the installation manual a suggestion to run git clone anvio to run mini test.

Here:

http://merenlab.org/2015/05/01/installation/#running-the-mini-test

A. Murat Eren (meren) http://merenlab.org :: gpg https://keybase.io/meren

On Wed, Mar 16, 2016 at 1:07 AM, liuxianghui notifications@github.com wrote:

I removed all anvio and reinstalled using pip. The anvio-profile works but not mini test or mini_test. Then I try to git clone anvio.git. However when run run_mini_test.sh. It said anvio-import-taxonomy-from-gene-annotations command not found. Please kindly suggest. Sent from my iPhone

On 16 Mar 2016, at 11:52 am, A. Murat Eren notifications@github.com wrote:

Hi Xianghui,

Your installation seems to be broken. Your anvi'o version says you are running 1.2.2, but in your bins directory there is an anvi-import-taxonomy-from-gene-annotations, which shouldn't be in 1.2.2 version.

I suggest you remove everything, and re-install anvi'o by following directions in the installation manual word by word.

svr_assign_to_dna_using_figfams is a myRAST program, installation and usage of which is also explained in the tutorial. If the example on the tutorial is not working, it is likely there is a problem with the RAST server, and it could be temporary.

Best wishes,

A. Murat Eren (meren) http://merenlab.org :: gpg https://keybase.io/meren

On Tue, Mar 15, 2016 at 10:43 PM, liuxianghui notifications@github.com wrote:

Dear Meren: I have problems at anvio test. Since my mac book got only 8G memory. I tried to install anvio both at Macbook and Linux cluster ( centos. readhat). I am hoping that I can try the clustering part them but visualise it in Mac. The installation seems ok. Mac got no problem till I saw the chrome window come out ( linux shows to 'clustering of 'tnf-cov' has been requested). However, I failed at my linux cluster at the following step. What could be the reason? Moreover, when I run 'svr_assign_to_dna_using_figfams' command, both mac and linux cluster shows nothing output. Please kindly help. Xianghui

(cgat-python)[xianghui@merlion tests]$ ./run_mini_test.sh

# # Creating the output directory ...

# #

# # Anvo'o version ...

# #

Anvi'o version ...............................: 1.2.2 Contigs DB version ...........................: 3 Profile DB version ...........................: 6 Samples information DB version ...............: 2 Auxiliary HDF5 DB version ....................: 1

# # Initializing raw BAM files ...

# #

Sorted BAM File ..............................: /data/xianghui/anvio/tests/sandbox/test-output/204-6M

Indexed BAM File .............................: /data/xianghui/anvio/tests/sandbox/test-output/204-6M.bam

Sorted BAM File ..............................: /data/xianghui/anvio/tests/sandbox/test-output/204-7M

Indexed BAM File .............................: /data/xianghui/anvio/tests/sandbox/test-output/204-7M.bam

Sorted BAM File ..............................: /data/xianghui/anvio/tests/sandbox/test-output/204-9M

Indexed BAM File .............................: /data/xianghui/anvio/tests/sandbox/test-output/204-9M.bam

# # Generating an EMPTY contigs database ...

# #

Contigs database .............................: A new database, test-output/CONTIGS.db, has been created. Number of contigs ............................: 6 Number of splits .............................: 60 Total number of nucleotides ..................: 57,030 Split length .................................: 1,000

# # Populating taxonomy for splits table in the database using 'myrast_cmdline' parser ...

# #

Traceback (most recent call last): File "/data/xianghui/anvio/bin/anvi-import-taxonomy-from-gene-annotations", line 88, in regarding parsers.' % (len(parser_modules['taxonomy']), parser_modules['taxonomy'].keys())) KeyError: 'taxonomy'

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/meren/anvio/issues/342

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/meren/anvio/issues/342#issuecomment-197167428

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub

meren commented 8 years ago

Hi,

Thank you very much for looking into anvi'o.

On Wed, Mar 16, 2016 at 9:21 PM, liuxianghui notifications@github.com wrote:

Clustering How do you normalize your tnf and cov and put them together for clustering. Did you follow the same way as what Concoct does? Especially for those with coverage equal to 0.

The default clustering recipe is here:

https://github.com/meren/anvio/blob/master/anvio/data/clusterconfigs/merged/tnf-cov

There is little documentation, but I am sure you can figure it out even from this file.

Vectors with 0 values are padded with a very small number:

[image: Inline image 1]

Split and marker gene Sometimes split can break 1 orf into two pieces which will certainly overestimate the level of contamination. How could we avoid it?

Anvi'o never does that. The code is very elaborate to maintain connectivity. Genes that occur in two splits are followed by unique ids, and treated as one whenever necessary. This should give some clues:

https://github.com/meren/anvio/blob/v1-branch/anvio/completeness.py

There is more in summarizer.py:

[image: Inline image 2]

In v2-branch things are even more easier as contigs are always split from non-coding regions.

Refine When you do refine based on a special bin , e.g., group 6, in concoct which has a high level contamination rate. Is it still clustering by tnf + cov or just based on only tnf since cov is biased to one sample.

Clustering configurations for single samples are here:

https://github.com/meren/anvio/tree/v1-branch/anvio/data/clusterconfigs/single

Concoct Is the clustering of concoct is already integrated into your Anvio? Do I need to run a separate run of concoct on Linux clusters and import the data into anvio and view it in my Mac.

Yes, there is an extension for CONCOCT that is compiled within anvi'o.

Mapping to a reference genome I wish for the contigs in a bin, is it possible to run a mapping to a reference genome and order the contigs in that bin.

We rarely work with genomes that have many references. Therefore we never explored this. When it was necessary, we used contiguator http://contiguator.sourceforge.net/.

This will give me much much more confidence in the contigs. Theoretically it might be easy to just do a blastn. However, I have concerns on that. I tried before to map one draft genome with hundreds of contigs to a reference genome. I saw that each contig got several pieces mapped to the reference genome other than the expected result that each contig matches to a part in reference genome.

Yes, BLAST for this type of analysis creates a mess that is nearly impossible to resolve :(

Besides I have concerns in this practice of matching nucleotide. Will matching the orfs to reference genome be better as protein may be more conserved?

If the purpose is to make sure a bin is properly identified, I think these approaches will not be very very helpful. But it might worth trying.

Misassembly Can anvio help to detect the misassembly of contig?

Anvi'o makes it easier to identify potentially chimeric contig if the coverage differs dramatically. STD of coverage view is usually very useful for this. Also variability view can help identify anomalies. Usually inspection shows poor assemblies almost immediately. A python script that accesses the PROFILE.db can immediately generate a reasonable report of potentially poorly assembled contigs to make things faster.

Thanks for the questions. I hope these help.

Best,

liuxianghui commented 7 years ago

Dear Meren: Just to double check with you, is the hierarchical clustering in anvio is on the coverage with or without any normalizing ? It use what distance? Xianghui Ps: does anvio supports import the sorted bam files as I have only the sorted bam files only now. I did not keep the original bam files. Please kindly suggest.

Sent from my iPhone

On 23 Mar 2016, at 10:18 AM, A. Murat Eren notifications@github.com wrote:

Closed #342.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub

meren commented 7 years ago

Hi,

This is from the metagenomic workflow web source:

When you run anvi-merge,

(...)

It will attempt to create multiple clusterings of your splits using the default clustering configurations. Please take a quick look at the default clustering configurations for merged profiles --they are pretty easy to understand. By default, anvi'o will use euclidean distance and ward linkage algorithm to organize contigs, however, you can change those default values with --distance and --linkage parameters (available options for distance metrics and linkage algorithms are listed in this release note). Hierarchical clustering results are necessary for comprehensive visualization, and human guided binning, therefore, by default, anvi'o attempts to cluster your contigs using default configurations. You can skip this step by using --skip-hierarchical-clustering flag. But even if you don't skip it, anvi'o will skip it for you if you have more than 20,000 splits, since the computational complexity of this process will get less and less feasible with increasing number of splits. That's OK, though. There are many ways to recover from this. On the other hand, if you want to teach everyone who is the boss, you can force anvi'o try to cluster your splits regardless of how many of them are there by using --enforce-hierarchical-clustering flag. You have the power.

Does it help?

liuxianghui commented 7 years ago

Thanks. That is what I thought. You did not do log transformation. Instead you calculate the Euclidean distance from coverages. I am thinking of using Pearson correlation coefficient. However, can it avoid the problem of some bacteria in 1 of samples has exceptionally high coverages than in others? I also asked the sorted bam files problem. Do you have a good solution to that? Regards, Xianghui

Sent from my iPhone

On 10 Nov 2016, at 9:58 PM, A. Murat Eren notifications@github.com wrote:

Hi,

This is from the metagenomic workflow web source:

When you run anvi-merge,

(...)

It will attempt to create multiple clusterings of your splits using the default clustering configurations. Please take a quick look at the default clustering configurations for merged profiles --they are pretty easy to understand. By default, anvi'o will use euclidean distance and ward linkage algorithm to organize contigs, however, you can change those default values with --distance and --linkage parameters (available options for distance metrics and linkage algorithms are listed in this release note). Hierarchical clustering results are necessary for comprehensive visualization, and human guided binning, therefore, by default, anvi'o attempts to cluster your contigs using default configurations. You can skip this step by using --skip-hierarchical-clustering flag. But even if you don't skip it, anvi'o will skip it for you if you have more than 20,000 splits, since the computational complexity of this process will get less and less feasible with increasing number of splits. That's OK, though. There are many ways to recover from this. On the other hand, if you want to teach everyone who is the boss, you can force anvi'o try to cluster your splits regardless of how many of them are there by using --enforce-hierarchical-clustering flag. You have the power. Does it help?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

meren commented 7 years ago

I also asked the sorted bam files problem.

Maybe you should take a look at the documentation before asking questions that are already answered?

anvi-init-bam

Anvi'o requires BAM files to be sorted and indexed. In most cases the BAM file you get back from your mapping software will not be sorted and indexed. (...) If your BAM files already sorted and indexed (i.e., for each .bam file you have there also is a .bam.bai file in the same directory), you can skip this step. Otherwise, you need to initialize your BAM files:

$ anvi-init-bam SAMPLE-01-RAW.bam -o SAMPLE-01.bam