There is topology bias in DUD/DUD-E data set for decoys are selected to be dissimilar to actives.
I tried to sample decoys which dissimilar to actives of same target but similar to actives of other targets.
It did not reduce the bias because the decoys still much more similar to decoys than actives for limiting decoys from narrow chemical space.
So now I try to directly sample decoys from whole ZINC database for reducing similarity between decoys.
There is topology bias in DUD/DUD-E data set for decoys are selected to be dissimilar to actives.
I tried to sample decoys which dissimilar to actives of same target but similar to actives of other targets. It did not reduce the bias because the decoys still much more similar to decoys than actives for limiting decoys from narrow chemical space.
So now I try to directly sample decoys from whole ZINC database for reducing similarity between decoys.