Closed gbiagini closed 4 years ago
Looks like using the ANHIG/IMGTHLA _nuc.msf
files might be the best way to do this. Since they contain the coding sequences only, I may be able to split directly into codons, impute, and translate the sequence. I'll test it out and see if it works. If not, I may need to look into using the _gen.msf
files.
Opened issue #37 to address the _nuc.msf
files (and record results) directly
Imputation algorithm currently infers missing sequence based on Hamming distance and nearest neighbor between amino acid sequences. This needs to be redone using nucleotide sequences as a basis for comparison instead of amino acid sequence.