scikit-learn-intelex - Githubissues

olibclarke commented 1 year ago

Hi,

During installation of cryodrgn on a new system, conda gave the following message:

    Installed package of scikit-learn can be accelerated using scikit-learn-intelex.
    More details are available here: https://intel.github.io/scikit-learn-intelex

    For example:

        $ conda install scikit-learn-intelex
        $ python -m sklearnex my_application.py

Can cryodrgn take advantage of this patched version of scikit-learn? They claim speedups are considerable (though not sure if this part of the pipeline is a bottleneck for cryodrgn or not)

Cheers Oli

zoobab commented 1 year ago

Any idea how to disable those messages?

It pollutes my logs.

Guillawme commented 1 year ago

Can cryodrgn take advantage of this patched version of scikit-learn?

Training the model is the most computationally demanding process, by a very large margin, and doesn't use scikit-learn (it uses pytorch). You would only get a speedup for some steps of cryodrgn analyze. But in this command, a large fraction of the run time is spent on the UMAP calculations (not part of scikit-learn), probably followed by generating maps by running cryodrgn eval_vol internally, which uses pytorch again and is pretty fast. If you request a very large number of maps, this step may take longer than UMAP. So, from my understanding, using this patched scikit-learn would optimize steps that account for only a tiny fraction of the total run time of a cryoDRGN job (kmeans clustering and PCA during cryodrgn analyze, GMM clustering in the Jupyter notebook).

If I had a lot of free time I would probably try using this patched scikit-learn in a fresh conda environment, just out of curiosity, but given the above considerations I would not expect much from it. But I don't have that free time, so going beyond the thought experiment is left as an exercise to the readers. 😇

ml-struct-bio / cryodrgn

scikit-learn-intelex #242