Closed mj-thompson closed 10 months ago
Hi, thanks for raising this. I should be able to put those sets together over the next few days. Depending on how that goes I'll also try and put together a separate guide for how to go about building generic training set spits based on CATH, time permitting.
I've added links for downloading the datasets to the README.
Hi there.
I'm working on a related project to what you all have developed here, and while I greatly appreciate the availability of the 1.08M pseudo-labelled data, I was wondering whether it would be possible to obtain the other datasets used in this manuscript. Namely, the labelled training, validation, and test sets, as referenced in the text. If they're unavailable, any code used in the preprocessing to replicate the exact steps taken in the manuscript would also be great.
Thank you for your help, I include the quote below for reference.