Closed jyaacoub closed 4 months ago
Davis might still be overlapping in train and test sets if drop_duplicates is done on the protein ID instead of the protein sequences themselves.
See #73 for unique protein details
This should be mentioned in the paper as a potential limitation, but no bandwidth to fix this now since it would require full retraining,
Davis might still be overlapping in train and test sets if drop_duplicates is done on the protein ID instead of the protein sequences themselves.
See #73 for unique protein details