vkola-lab / peds2019

Quantifying the nativeness of antibody sequences using long short-term memory networks
MIT License
16 stars 6 forks source link

Sequence clustering #6

Closed wjs20 closed 3 years ago

wjs20 commented 3 years ago

Hi

In the methods section of your nativeness paper, you state that 'These sequences were further clustered at the 97% identity level to avoid sampling highly related sequences between the training and testing sets'

Could you give me some guidance on how this was done/what tools you use?

Thanks

tanggis commented 3 years ago

Hi, sequence clustering was performed using CD-HIT (Fu et al., 2012).