In the methods section of your nativeness paper, you state that 'These sequences were further clustered at
the 97% identity level to avoid sampling highly related sequences between the training and testing sets'
Could you give me some guidance on how this was done/what tools you use?
Hi
In the methods section of your nativeness paper, you state that 'These sequences were further clustered at the 97% identity level to avoid sampling highly related sequences between the training and testing sets'
Could you give me some guidance on how this was done/what tools you use?
Thanks