lgragert / nn-sero-pytorch

PyTorch version of neural network HLA serology prediction
2 stars 1 forks source link

Determine polymorphism limit of R-SNNS #27

Closed lgragert closed 3 years ago

lgragert commented 4 years ago

R-SNNS crashes on full dataset after IMGT/HLA DB refresh.

Most likely to due to too many input nodes.

Start with previous dataset and add polymorphisms/nodes in order of major AA frequency until failure. There will some positions that are not as polymorphic - those get added later. Order by mutation frequency - sometimes only one allele would be different from the reference sequence. Shannon entropy would work just as well.

Combine/remove alleles that differ at AAs that aren't specified in input.

gbiagini commented 4 years ago

This was an issue with the newly generated pattern files. There was a typo in the generation of the test set .pat files that caused extra characters and an uneven number of input nodes as a result. Fixed in an earlier commit. Leaving issue open as a not to check the boundaries of the RSNNS implementation.