Closed spaver closed 7 years ago
This is now done (640c57187e2f252d6f80c6374595d7758eded33d, 59ad343a8c063f4b06635a65255b415634bcf211, 745151010bb72dbce6bf660f76d5ced8f824499b, 02bb20619b55f0f6e061cd60c56ba6257b1fcfdf, etc).
This used to report AA sequence alignments:
anvi-export-pc-alignments -p TEST/TEST-PAN.db \
-g TEST-GENOMES.h5 \
-C test_collection \
-b PCB_1_CORE \
-o aligned_gene_sequences_in_PCB_1_CORE_AA.fa
Now it can report DNA sequence alignments with the additional flag:
anvi-export-pc-alignments -p TEST/TEST-PAN.db \
-g TEST-GENOMES.h5 \
-C test_collection \
-b PCB_1_CORE \
-o aligned_gene_sequences_in_PCB_1_CORE_DNA.fa \
--report-DNA-sequences
Although we had to update the version for genomes storage, which means it will require one to re-run anvi-gen-genomes-storage
and anvi-pan-genme
steps...
Best,
It is really helpful to be able to extract the nucleotide sequences. Thanks Meren for doing this. However, While I was trying to use your command lines to report the DNA sequences alignment, I realize anvi-export-pc-alignments command was not found. Also can you please specify what is test_collection file? Thanks again.
Which anvi'o version are you using @wangshi831?
@meren Thanks for the swift reply. I am using version 4.
ah, yes, that program became this program:
http://merenlab.org/software/anvio/vignette/#anvi-get-sequences-for-gene-clusters
please take a look at the help menu and let me know if it doesn't make sense.
Hi Meren,
I tried to extract DNA sequences from singleton genes cluster from AL_A1_IMG bacterial genome.
The code used as below
anvi-get-dna-sequences-for-gene-calls -c AL_A1_IMG.db \ --gene-caller-ids ALA1singletongenecallerID \ -o AL-A1SingletonDNASequences.fa \ --report-extended-deflines
But I got a contig error:
Config Error: The gene calls you provided do not look like gene callers anvi'o is used to working with :/ Here is one of them: 'ALA1singletongenecallerID' (<class
'str'>).
The gene caller ID I got from the output file of pangenome summary. And I saved multiple gene caller ID from this genome in comma separated values file. But it did not work. Please advice.
Thanks,
Hi @wangshi831,
Unfortunately, the version you use is different than the new version. List of gene callers id in a file only accepted in new version. You need to provide something like --gene-caller-ids 1,2,3,4 --delimiter ","
for the old version.
You can convert your file to comma separated list using command below:
awk '{printf (NR>1?",":"") $1}' ALA1singletongenecallerID
Then you can run:
anvi-get-dna-sequences-for-gene-calls -c AL_A1_IMG.db --gene-caller-ids {{list of gene callers id awk generated}} --delimiter "," -o AL-A1SingletonDNASequences.fa --report-extended-deflines
Please let us know if this also fails. Best,
Great. Thanks Meren. It works! BTW, is there a way to update the version I am using to the newest version? I have v4 on my PC and v2.4 on my laptop.
Hi @wangshi831,
We have not released the version 5 yet, but we are going to release soon. You can still install the development version (there are instructions on our website) but if you are not experiencing any road blocker bugs on v4, I would recommend you to wait till we release the new version.
Best, Ozcan
Great. Thanks Meren.
You're welcome!
Obama.
It would be useful if there were a direct way to extract all nucleotide sequences from a specific protein cluster resulting from a pangenomics analysis.