Closed TimJLU closed 3 years ago
This PR is a result of our discussion in the forum: https://forum.qiime2.org/t/t-sne-or-umap-as-q2-plugins/14477
I wonder why Travis isn't trigged by this PR - are we missing something here? Or do we have to be more patient?
I wonder why Travis isn't trigged by this PR - are we missing something here?
I have no clue why it didn't trigger... Maybe GitHub had a hiccup? EIther way, I don't expect the tests to tell us to much, since the function added here hasn't been wired into the plugin, and there are no unit tests yet.
This is pretty exciting, let us know what you need from us!
The recent check has failed, but I don´t get the Error Message. Is it because of the tsne Function and the way I used **kwargs?
We should think about which parameters of the algorithms should be exposed to the user via the plugin. I'd think exposing all parameters is overkill and will confuse the default user. Unfortunately, I am not very familiar with both techniques and cannot answer this question due to personal experience :-/
Reading about t-SNE, I think our plugin should expose perplexity
and n_components
. If the Qiime2 framework allows access to the --verbose
parameter, we might also want to pass that to the verbose
flag of the sklearn algorithm. And since we will always pass a precomputed distance matrix, n_jobs
should be set to 1. Not sure if we want to expose it as --p-n-jobs
to indicate that multiple CPUs/cores won't speed up the process?
UMAP documentation has the great chapter https://umap-learn.readthedocs.io/en/latest/parameters.html# which tells me we should expose n_neighbors
, min_dist
and of course n_components
many thanks @thermokarst. @TimJLU is on a writing sprint to finish his masters thesis. It contains some benchmark and examples for applying PCoA, tSNE, umap to different datasets. Hope we can distil his results into a nice little summary to show that each method suits different purposes and therefore it is great to have them all in qiime2
I finally were able to implement all your requested changes @thermokarst.
Right now the number_of_dimensions parameters are required (but since they specify a default value of 2, you don't necessary have to provide that as input). Looking at the pcoa method in this plugin, it looks like there we have made the number_of_dimensions parameter completely optional. I am curious why you aren't doing that here - care to explain?
During my thesis I got the best visual results for t-SNE and UMAP by reducing the dimensions to 2. The idea was to use recommended default settings so that inexperienced Users could get the best results without changing any of the variables.
Thank you very much @ebolyen and @thermokarst. I hope I changed everything accordingly. The only question still up is how to deal with the numba package within in the umap-learn package? Did I miss something?
Hey, i saw that the numba package got an update on August 20. Maybe this would solve the problem on it's own?
@TimJLU, thanks so much for this pull request and apologies for how long it took us to get this in!
I just pulled this branch down and tested both new actions with the Moving Pictures tutorial data and a new tutorial data set that we're working with (Liao et al, 2021). The umap-learn team's conda install command worked for me, so it looks like that issue is sorted out, and the plots I generated all look reasonable. We're going to run the tests, and as long as everything seems to be working we'll get this merged for the 2021.11 release (scheduled for next week).
I added t-SNE Function to the _ordination.py File.
It is just a first draft to start a Discussion and still in progress.
Next step would be: Better way to pass Arguments. Create a default setting if you don't pass any Arguments.