Open bnelsj opened 6 months ago
Sounds like a good idea.
On Thu, May 2, 2024 at 5:37 PM Brad Nelson @.***> wrote:
Parametric UMAP stores multiple copies of the full input data, but these are unnecessary for transforming new data points. By deleting self._raw_data and self._knn_search_index._raw_data from my Parametric UMAP model object, I was able to reduce the size of the saved model from 90 GB to 300 MB (the input data is a distance matrix with 80K locations). This might not work for models that require additional training, but perhaps should be an option when model size is an issue?
— Reply to this email directly, view it on GitHub https://github.com/lmcinnes/umap/issues/1118, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJYKBWB42UCIRPIVNCSIJLZAKWZJAVCNFSM6AAAAABHEPKJE6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGI3TMNJUGUYDQNQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Tim Sainburg https://timsainburg.com/ Postdoctoral Fellow Harvard Medical School 814.574.7780, @.***
I'm having the same issue, where I want the trained model to be as small as possible (the inference machine does not have as much memory as the training machine). I'll link a PR where I added a parameter to remove the raw data to the save
method.
Parametric UMAP stores multiple copies of the full input data, but these are unnecessary for transforming new data points. By deleting
self._raw_data
andself._knn_search_index._raw_data
from my Parametric UMAP model object, I was able to reduce the size of the saved model from 90 GB to 300 MB (the input data is a distance matrix with 80K locations). This might not work for models that require additional training, but perhaps should be an option when model size is an issue?