sacdallago / biotrainer

Biological prediction models made simple.
https://biocentral.cloud/app
Academic Free License v3.0
34 stars 8 forks source link

Optimize Memory Handling in Embedding Computations and Refactor EmbeddingService #103

Closed heispv closed 3 months ago

heispv commented 3 months ago

Description:

This PR enhances the EmbeddingService by optimizing memory usage during embedding computations, refactoring core logic, and adding unit tests for robustness.

Key Changes:

  1. Dynamic Memory Management:

    • Automatically estimate and manage RAM usage by saving embeddings to disk when memory limits are reached.
  2. Refactor EmbeddingService:

    • Extracted core logic into a dedicated method.
    • Handled ultra-long reads by saving embeddings immediately and preventing additional sequence loading.
    • Removed SAVE_AFTER_N_EMBEDDINGS.
  3. Unit Testing:

    • Added tests for computing embeddings on long, short, and mixed sequences.
    • Validated dynamic memory management and embedding computation accuracy.
    • Ensured correct result paths and file existence in the tests.

Issue Reference:

Closes [#98]

heispv commented 3 months ago

Thank you for the suggestions on improving the changes I made. I just updated the code based on your feedback! :)

heispv commented 3 months ago

Sebastian, I just updated the code as you suggested 😉