lgragert / nn-sero-pytorch

PyTorch version of neural network HLA serology prediction
2 stars 1 forks source link

Rework imputation algorithm to function based on nucleotide sequence instead of AA sequence #35

Closed gbiagini closed 4 years ago

gbiagini commented 4 years ago

Imputation algorithm currently infers missing sequence based on Hamming distance and nearest neighbor between amino acid sequences. This needs to be redone using nucleotide sequences as a basis for comparison instead of amino acid sequence.

gbiagini commented 4 years ago

Looks like using the ANHIG/IMGTHLA _nuc.msf files might be the best way to do this. Since they contain the coding sequences only, I may be able to split directly into codons, impute, and translate the sequence. I'll test it out and see if it works. If not, I may need to look into using the _gen.msf files.

gbiagini commented 4 years ago

Opened issue #37 to address the _nuc.msf files (and record results) directly