sacdallago / bio_embeddings

Get protein embeddings from protein sequences
http://docs.bioembeddings.com
MIT License
463 stars 65 forks source link

Documentation improvements #125

Open sacdallago opened 3 years ago

sacdallago commented 3 years ago

Following various cases of misinterpreted or missing documentation, one of the next things to work on (for me) are new docs. This includes:

  1. Improved readme
  2. Dedicate instal .md with explanation as to what [all] means and "which extras are right for you"
  3. Use of common language, e.g. "homology" or "sequence similarity lookup" for unsupervised annotation transfer.
  4. Maybe a FAQ (although there aren't many FAQs as of now)

It may be time for a "single" documentation place. docs.bioembeddings.com is good, just needs more + more verbose pages for non-code related docs.

sacdallago commented 3 years ago

Models (https://docs.bioembeddings.com/v0.1.6/api/bio_embeddings.embed.html) should document what options can be passed in kwargs, e.g. model_directory to pass model weights (potentially put link to data.bioembeddings?) or weights_file and options_file for SeqVec.

CC @konstin

konstin commented 3 years ago

Through doc comments or by replacing **kwargs with the actual args? (see e.g. https://github.com/encode/httpx/blob/2e4b308d7ab138de106f6672326a02e07d350904/httpx/_client.py#L573-L605 for an extreme case of the latter)

sacdallago commented 3 years ago

Your opinion?

I think it's anyway not fruitful to e.g. define these at the level of the interface because 90% of users won't know or see that e.g. ProtTransBert... is a child of BertBase... so it's not going to buy us much