Closed nickbhat closed 5 years ago
I'd appreciate details on how you subset Pfam (e.g. threw out small families, long sequences, etc) for training the LM's initially. Couldn't find many details in the paper or the repo.
No such preprocessing was done. You can download the exact dataset used following the link in the README.
I'd appreciate details on how you subset Pfam (e.g. threw out small families, long sequences, etc) for training the LM's initially. Couldn't find many details in the paper or the repo.