To not lose my comments in the folded diffs above:
I would change the API and make the sampling function accepts a list of signatures, where these signatures are assumed to be already coming from the training set. This way, train/test split would happen only at one place (i.e. not inside the function) and this function would focus on what it should: drawing pairs.
To not lose my comments in the folded diffs above:
I would change the API and make the sampling function accepts a list of signatures, where these signatures are assumed to be already coming from the training set. This way, train/test split would happen only at one place (i.e. not inside the function) and this function would focus on what it should: drawing pairs.