Open Cliff-Lin opened 3 years ago
I've tried fft mode. I can see the running iteration, but it runs 1000x or 10000x slower than the default setting for all sets. It should be weird, right?
Thanks for your bug report! I can answer a few of your questions but will need the help of a couple of my colleagues to diagnose the root cause of the problem.
First, can you give us more information to help us reproduce the issue on our end? You mentioned that some feature sets work and others get stuck. Do you have examples of both of these feature sets that we could try?
Regarding your other questions:
Is there any option I can obtain the iterations it has ran?
As far as I know, this is currently not possible. If you are terminating the TSNE.fit()
call with Ctrl+C, this will kill the current executing statement that is performing the iteration. New functionality would need to be added to store or write out intermediate iterations that could be viewed after a KeyboardInterrupt
is raised.
In the meantime, I would suggest you try:
n_iter=100, 1000, 2000, ... 10000
, etc.) you should be able to get the output at intermediate states. @cjnolet and @divyegala Do you have any other suggestions or an idea of what could be causing the TSNE algorithm to take longer for certain feature sets?
I've tried fft mode. I can see the running iteration, but it runs 1000x or 10000x slower than the default setting for all sets. It should be weird, right?
I could not reproduce the slowdown that you experienced with the FFT method. Running the short dataset takes me 55 seconds, the long dataset 65 seconds, and running it with a synthetic dataset from make_blobs
or make_classification
of the same shape as your dataset takes me 60 seconds, with the FFT method and default parameters. So the FFT method
And to add a few information on the Barnes-Hut method, it is hanging on my side too. After adding logging information I saw that the short dataset was blocked around iteration 678 and the short dataset around iteration 526.
I'm using the current dev version (21.12) with Ubuntu 20.04, CUDA 11.5 and driver 470 on a Quadro RTX 8000.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
Describe the bug I have some feature sets whose dimension is 128. If the iteration of tSNE is more than 1000, it can finish in a few minutes for most sets. However, it is stuck for more than 12 hours on other sets. Since I don't know how long it will take, I terminate it before it finishes. Is there any option I can obtain the iterations it has ran? What is the condition causing it stuck? Actually, the proper iteration is 10000 for my case (nearly 500K samples) to gain a better result.
Steps/Code to reproduce bug Just call TSNE(n_iter=10000).fir_trainsform(x)
Expected behavior All sets should be finished within nearly equal time budget.
Environment details (please complete the following information):