Removing dependency on bio_embeddings entirely. bio_embeddings is not really maintained
anymore (last commit 2 years ago) and being dependent on a specific external module for embeddings calculation
shrinks the overall capabilities of biotrainer. Now, for example, adding LORA layers becomes much easier.
While bio_embeddings does have its advantages such as a well-defined pipeline and a lot of utilities, it also
provides a lot of functionalities that is not used by biotrainer. Therefore, a new embedders module was introduced
to biotrainer that mimics some aspects of bio_embeddings and takes inspiration from it. However, it is built in a more
generic way and enables, in principle, all huggingface transformer embedders to be used by biotrainer.
Ankh custom embedder was removed, because it can now be used directly in biotrainer:
embedder_name: ElnaggarLab/ankh-large
Adding new use_half_precision option for transformer embedders
Adding missing device option
Bug fixes
Fixed a minor problem for model saving in Solver.py:
If a new model was trained, and it does not improve until early_stop is triggered, it was not saved as a checkpoint.
04.01.2024 - Version 0.8.0
Maintenance
embedders
module was introduced to biotrainer that mimics some aspects of bio_embeddings and takes inspiration from it. However, it is built in a more generic way and enables, in principle, all huggingface transformer embedders to be used by biotrainer.Bug fixes
Solver.py
: If a new model was trained, and it does not improve untilearly_stop
is triggered, it was not saved as a checkpoint.