Closed Benjamin-Lee closed 9 months ago
Hey @Benjamin-Lee, this is the type of use-case I'd like to cover with this. Thanks!
I'm thinking that the current From<Kmer<Dna, 3>> for Amino
isn't the way to go. For safety and generality it would be cool if
So with:
trait TranslationTable<A: Codec> {
fn to_amino(self, codon: Kmer<A, 3>) -> Amino;
fn to_codon(self, amino: Amino) -> Option<Kmer<A, 3>>;
}
There can be a Standard
struct that implements these methods the same way that translation is implemented now with zero cost (I'm not a fan of the all-caps name, clippy is):
const STANDARD: Standard = Standard;
But then users can define their own Custom
translation table:
impl Custom {
pub fn from_vec(table: Vec<Amino>) -> Self {
Custom { table }
}
}
impl<A: Codec> TranslationTable<A> for Custom {
fn to_amino(self, codon: Kmer<A, 3>) -> Amino {
self.table[Into::<usize>::into(codon)]
}
I've pushed these changes but they aren't ready yet. The old test cases have been updated:
let seq: Seq<Dna> =
dna!("AATTTGTGGGTTCGTCTGCGGCTCCGCCCTTAGTACTATGAGGACGATCAGCACCATAAGAACAAA");
let aminos: Seq<Amino> = seq
.kmers()
.map(|kmer| STANDARD.to_amino(kmer))
.collect::<Seq<Amino>>();
and the usage looks very similar to the original. That type annotation on collect
is probably unnecessary now.
What I'm mulling over now is how to, or whether to support fallible translations. This is tempting because I want to support IUPAC to amino acid sequences. A lot of the triples of iupac ambiguity codes unambiguously represent amino acids (and vice-versa!) so we'd be able to get rust's type system to cover some error prone uses.
The goal is that this library could be something someone uses to implement a tool like FACIL or Codetta in rust.
I'm somewhat satisfied with this for now. There are details at the bottom of the README now, and lots of test cases in the translation module:
pub trait TranslationTable<A: Codec, B: Codec> {
fn to_amino(&self, codon: &SeqSlice<A>) -> B;
fn to_codon(&self, amino: B) -> Result<Seq<A>, TranslationError>;
}
/// A partial translation table where not all triples of characters map to amino acids
pub trait PartialTranslationTable<A: Codec, B: Codec> {
fn try_to_amino(&self, codon: &SeqSlice<A>) -> Result<B, TranslationError>;
fn try_to_codon(&self, amino: B) -> Result<Seq<A>, TranslationError>;
}
The standard genetic code is provided in the translation::standard
module:
use crate::prelude::*;
use crate::translation::standard::STANDARD;
use crate::translation::TranslationTable;
let seq: Seq<Dna> =
dna!("AATTTGTGGGTTCGTCTGCGGCTCCGCCCTTAGTACTATGAGGACGATCAGCACCATAAGAACAAA");
let aminos: Seq<Amino> = seq
.windows(3)
.map(|codon| STANDARD.to_amino(&codon))
.collect::<Seq<Amino>>();
assert_eq!(
aminos,
amino!("NIFLCVWGGVFSRVSLCARGALSPRAPPLL*SVYTLYM*ERGDTRDISQSAHTPHI*KRENTQK")
);
Instantiate a custom translation table from a type that implements Into<HashMap<Seq<A>, B>>
:
let codon_mapping: [(Seq<Dna>, Amino); 6] = [
(dna!("AAA"), Amino::A),
(dna!("ATG"), Amino::A),
(dna!("CCC"), Amino::C),
(dna!("GGG"), Amino::E),
(dna!("TTT"), Amino::D),
(dna!("TTA"), Amino::F),
];
let table = CodonTable::from_map(codon_mapping);
let seq: Seq<Dna> = dna!("AAACCCGGGTTTTTATTAATG");
let mut amino_seq: Seq<Amino> = Seq::new();
for codon in seq.chunks(3) {
amino_seq.push(table.try_to_amino(codon).unwrap());
}
assert_eq!(amino_seq, amino!("ACEDFFA"));
Implementing the TranslationTable
trait directly:
struct Mitochondria;
impl TranslationTable<Dna, Amino> for Mitochondria {
fn to_amino(&self, codon: &SeqSlice<Dna>) -> Amino {
if *codon == dna!("AGA") {
Amino::X
} else if *codon == dna!("AGG") {
Amino::X
} else if *codon == dna!("ATA") {
Amino::M
} else if *codon == dna!("TGA") {
Amino::W
} else {
Amino::unsafe_from_bits(Into::<u8>::into(codon))
}
}
fn to_codon(&self, _amino: Amino) -> Result<Seq<Dna>, TranslationError> {
unimplemented!()
}
}
I've been looking all over for a Rust library that supports multiple genetic codes. This library seems to be the best translation library, so I hope this is on your radar. Thanks!