Open rillaxy opened 2 years ago
Halo @rillaxy , thanks for reading our paper! You can extract the 4dtv in the following way:
## This code is from: https://www.biostars.org/p/157527/
bases = ['t', 'c', 'a', 'g']
codons = [a+b+c for a in bases for b in bases for c in bases]
amino_acids = 'FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG'
codon_table = dict(zip(codons, amino_acids))
def translate(seq):
seq = seq.lower().replace('\n', '').replace(' ', '')
peptide = ''
for i in range(0, len(seq), 3):
codon = seq[i: i+3]
amino_acid = codon_table.get(codon, '*')
if amino_acid != '*':
peptide += amino_acid
else:
break
return peptide
codon_dict={}
for aa in set(amino_acids):
temp=[]
for c in codons:
if translate(c)==aa:
temp.append(c)
if len(temp)==4:
codon_dict[aa]=temp
print(codon_dict)
Here, note that it misses the Ser, should add by yourself. (I notice this mistake recently, but as the paper has already submitted, I have not revised it.)
from Bio import SeqIO
position = list()
for seq in SeqIO.parse("path_to_your_seq_file.fasta", "fasta"):
for i in range(0, len(seq), 3):
codon = seq.seq[i: i+3]
if any([codon[0:2] == 'GG', codon[1:2] == 'GC', codon[1:2] == 'AC', codon[1:2] == 'GT', codon[1:2] == 'CC']):
position.append(i + 2)
print(position)
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
your_seq_list = list()
extracted_seq = list()
for seq in SeqIO.parse("path_to_your_seq_file.fasta", "fasta"):
if seq.id == 'XXXX':
for i in range(0, len(sorted(set(position)))):
your_seq_list.append(seq[i])
extracted_seq.append(SeqRecord(Seq("".join(XXXX)), id = "XXXX", description = ''))
SeqIO.write(extracted_seq, "path_to_your_extracted_seq_file.fasta", "fasta")
Thanks @lindan1128 ,the codon_dict also misses L and R
Hello, In the paper about A. laboriosa, you estimated divergence time among species based on the concatenated alignments of 4-fold degenerate sites from single-copy genes using MCMCtree in PAML v4.9. But I can't find this step in your command lines. Because I also want to estimated divergence time based on some conserved sequence, so how do you extract 4-fold degenerate sites from the concatenated alignments of single-copy genes ? I'm looking forward to your reply, thanks