missuse / ragp

Filter plant hydroxyproline rich glycoproteins
MIT License
5 stars 4 forks source link

supporting X, B and Z in predict hyp input #10

Open missuse opened 2 years ago

missuse commented 2 years ago

predict_hyp could support unknown amino acids symbols X (any), B (D or N) and Z (E or Q) if they occur up to once or even twice in subsequence.

This could proceed by generating all possible subsequences (this is why a limit of X is needed 20 20 is manageable but 20 20 * 20 is too much per subsequence), predicting Hyp probability in them and retuning the probability that is the min, max or floor(median) of predictions based on an additional argument in the function.