nanxstats / protr

🧬 Toolkit for generating various numerical features of protein sequences
https://nanx.me/protr/
Other
52 stars 13 forks source link

New Protein Descriptor: Symmetric extractDC #44

Open discoleo opened 6 months ago

discoleo commented 6 months ago

New Protein Descriptor: Symmetric extractDC

The current extractDC is not symmetric, which generates 400 keys. This has some drawbacks:

It may be wise to implement a symmetric descriptor, where "XY" == "YX":

The symmetric variant would have 210 keys instead of 400 keys, e.g. "AA", "AC", "AD", ..., "XY", with "X" letter before "Y"-letter. The proprotions could be normalized by dividing to (2*n-2), where n = number of AA in the protein.

It would be interesting to compare this descriptor against the current extractDC on real-life protein data sets.

nanxstats commented 6 months ago

@discoleo Sure! I feel this could be a useful addition, maybe as a separate function. Do you mind sending a pull request?