jyaacoub / MutDTA

Improving the precision oncology pipeline by providing binding affinity purtubations predictions on a pirori identified cancer driver genes.
1 stars 2 forks source link

Check train split proteins for davis based on sequence #74

Closed jyaacoub closed 4 months ago

jyaacoub commented 8 months ago

Davis might still be overlapping in train and test sets if drop_duplicates is done on the protein ID instead of the protein sequences themselves.

See #73 for unique protein details

jyaacoub commented 4 months ago

This should be mentioned in the paper as a potential limitation, but no bandwidth to fix this now since it would require full retraining,