rs239 / ablm

Protein language model customized for antibodies
MIT License
60 stars 8 forks source link

easier imports, AbMAPAttn.embed, .yml file, demo_2.ipynb and profiling insights #1

Closed taylormjs closed 1 year ago

taylormjs commented 1 year ago
  1. Made improvements to init files for easier imports of ProteinEmbedding, reload_models_to_device, and AbMAPAttn (e.g. see examples/demo_2.ipynb). Also change relative imports to absolute --> should work with testpypi
  2. Added an AbMAPAttn.embed method, more user-friendly version of getting a variable and fixed-length embedding for a batch of sequences
  3. Added first iteration of demo_2.ipynb, which shows a slightly nicer API (abmap.AbMAPAttn instead of abmap.model.AbMAPAttn) and the AbMAPAttn.embed method
  4. Started to profile contrastive augmentation and AbMAPAttn.embed (see demo2.ipynb). Many calls (k=50) to foundation PLM is the main cause of the bottleneck. Consider dropping default k or having more mutations to the cdrs for each variant made. In extreme case, could have k=1 if we mutated all the CDR residues.
  5. Added environment.yml file with anarci, hmmer, etc. Takes a long time because of conflicting versions.
  6. Add cpu capability. Basically just changed all instances of tensor.cuda(device) --> tensor.to(device)