Extract nucleotide sequences from pangenomics analysis

merenlab / anvio

An analysis and visualization platform for 'omics data

http://merenlab.org/software/anvio

GNU General Public License v3.0

426 stars 145 forks source link

Extract nucleotide sequences from pangenomics analysis #539

Closed spaver closed 7 years ago

spaver commented 7 years ago

It would be useful if there were a direct way to extract all nucleotide sequences from a specific protein cluster resulting from a pangenomics analysis.

meren commented 7 years ago

This is now done (640c57187e2f252d6f80c6374595d7758eded33d, 59ad343a8c063f4b06635a65255b415634bcf211, 745151010bb72dbce6bf660f76d5ced8f824499b, 02bb20619b55f0f6e061cd60c56ba6257b1fcfdf, etc).

This used to report AA sequence alignments:

anvi-export-pc-alignments -p TEST/TEST-PAN.db \
                          -g TEST-GENOMES.h5 \
                          -C test_collection \
                          -b PCB_1_CORE \
                          -o aligned_gene_sequences_in_PCB_1_CORE_AA.fa

Now it can report DNA sequence alignments with the additional flag:

anvi-export-pc-alignments -p TEST/TEST-PAN.db \
                          -g TEST-GENOMES.h5 \
                          -C test_collection \
                          -b PCB_1_CORE \
                          -o aligned_gene_sequences_in_PCB_1_CORE_DNA.fa \
                          --report-DNA-sequences

Although we had to update the version for genomes storage, which means it will require one to re-run anvi-gen-genomes-storage and anvi-pan-genme steps...

Best,

wangshi831 commented 6 years ago

It is really helpful to be able to extract the nucleotide sequences. Thanks Meren for doing this. However, While I was trying to use your command lines to report the DNA sequences alignment, I realize anvi-export-pc-alignments command was not found. Also can you please specify what is test_collection file? Thanks again.

meren commented 6 years ago

Which anvi'o version are you using @wangshi831?

wangshi831 commented 6 years ago

@meren Thanks for the swift reply. I am using version 4.

meren commented 6 years ago

ah, yes, that program became this program:

http://merenlab.org/software/anvio/vignette/#anvi-get-sequences-for-gene-clusters

please take a look at the help menu and let me know if it doesn't make sense.

wangshi831 commented 6 years ago

Hi Meren, I tried to extract DNA sequences from singleton genes cluster from AL_A1_IMG bacterial genome. The code used as below anvi-get-dna-sequences-for-gene-calls -c AL_A1_IMG.db \ --gene-caller-ids ALA1singletongenecallerID \ -o AL-A1SingletonDNASequences.fa \ --report-extended-deflines But I got a contig error:

Config Error: The gene calls you provided do not look like gene callers anvi'o is used to working with :/ Here is one of them: 'ALA1singletongenecallerID' (<class
'str'>).

The gene caller ID I got from the output file of pangenome summary. And I saved multiple gene caller ID from this genome in comma separated values file. But it did not work. Please advice.

Thanks,

ozcan commented 6 years ago

Hi @wangshi831, Unfortunately, the version you use is different than the new version. List of gene callers id in a file only accepted in new version. You need to provide something like --gene-caller-ids 1,2,3,4 --delimiter "," for the old version.

You can convert your file to comma separated list using command below:

awk '{printf (NR>1?",":"") $1}' ALA1singletongenecallerID

Then you can run:

anvi-get-dna-sequences-for-gene-calls -c AL_A1_IMG.db  --gene-caller-ids {{list of gene callers id awk generated}} --delimiter "," -o AL-A1SingletonDNASequences.fa  --report-extended-deflines

Please let us know if this also fails. Best,

wangshi831 commented 6 years ago

Great. Thanks Meren. It works! BTW, is there a way to update the version I am using to the newest version? I have v4 on my PC and v2.4 on my laptop.

ozcan commented 6 years ago

Hi @wangshi831,

We have not released the version 5 yet, but we are going to release soon. You can still install the development version (there are instructions on our website) but if you are not experiencing any road blocker bugs on v4, I would recommend you to wait till we release the new version.

Best, Ozcan

meren commented 6 years ago

Great. Thanks Meren.

You're welcome!

Obama.