[BUG] Why tSNE is stuck depending on the distribution of data?

Cliff-Lin commented 3 years ago

Describe the bug I have some feature sets whose dimension is 128. If the iteration of tSNE is more than 1000, it can finish in a few minutes for most sets. However, it is stuck for more than 12 hours on other sets. Since I don't know how long it will take, I terminate it before it finishes. Is there any option I can obtain the iterations it has ran? What is the condition causing it stuck? Actually, the proper iteration is 10000 for my case (nearly 500K samples) to gain a better result.

Steps/Code to reproduce bug Just call TSNE(n_iter=10000).fir_trainsform(x)

Expected behavior All sets should be finished within nearly equal time budget.

Environment details (please complete the following information):

Environment location: Anaconda
Linux Distro/Architecture: [Ubuntu 16.04 amd64]
GPU Model/Driver: Titan RTX
CUDA: [10.2]

Method of cuDF & cuML install: [conda]

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
abseil-cpp                20210324.0           h9c3ff4c_0    conda-forge
alsa-lib                  1.2.3                h516909a_0    conda-forge
argon2-cffi               20.1.0           py37h5e8e339_2    conda-forge
arrow-cpp                 1.0.1           py37h363ccdf_36_cuda    conda-forge
arrow-cpp-proc            3.0.0                      cuda    conda-forge
async_generator           1.10                       py_0    conda-forge
attrs                     20.3.0             pyhd3deb0d_0    conda-forge
aws-c-cal                 0.4.5                h76129ab_8    conda-forge
aws-c-common              0.5.2                h7f98852_0    conda-forge
aws-c-event-stream        0.2.7                h6bac3ce_1    conda-forge
aws-c-io                  0.9.1                ha5b09cb_1    conda-forge
aws-checksums             0.1.11               h99e32c3_3    conda-forge
aws-sdk-cpp               1.8.151              hceb1b1e_1    conda-forge
backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
backports                 1.0                        py_2    conda-forge
backports.functools_lru_cache 1.6.4              pyhd8ed1ab_0    conda-forge
blazingsql                0.19.0                   pypi_0    pypi
bleach                    3.3.0              pyh44b312d_0    conda-forge
bokeh                     2.3.1            py37h89c1867_0    conda-forge
boost-cpp                 1.72.0               h9d3c048_4    conda-forge
brotli                    1.0.9                h9c3ff4c_4    conda-forge
brotlipy                  0.7.0           py37h5e8e339_1001    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.17.1               h7f98852_1    conda-forge
ca-certificates           2020.12.5            ha878542_0    conda-forge
cachetools                4.2.2              pyhd8ed1ab_0    conda-forge
cairo                     1.16.0            h6cf1ce9_1008    conda-forge
certifi                   2020.12.5        py37h89c1867_1    conda-forge
cffi                      1.14.5           py37hc58025e_0    conda-forge
chardet                   4.0.0            py37h89c1867_1    conda-forge
click                     7.1.2              pyh9f0ad1d_0    conda-forge
cloudpickle               1.6.0                      py_0    conda-forge
cryptography              3.4.7            py37h5d9358c_0    conda-forge
cudatoolkit               10.2.89              h8f6ccaa_8    nvidia
cudf                      0.19.1          cuda_10.2_py37_ga9f345390e_0    rapidsai
cudnn                     8.1.0.77             h469e712_0    conda-forge
cuml                      0.19.0          cuda10.2_py37_g4cb78ff1a_0    rapidsai
cupy                      8.6.0            py37hfed0110_0    conda-forge
cutensor                  1.2.2.5              h1a5f58c_3    conda-forge
cycler                    0.10.0                     py_2    conda-forge
cyrus-sasl                2.1.27               h3274739_1    conda-forge
cytoolz                   0.11.0           py37h5e8e339_3    conda-forge
dask                      2021.4.0           pyhd8ed1ab_0    conda-forge
dask-core                 2021.4.0           pyhd8ed1ab_0    conda-forge
dask-cuda                 0.19.0                   py37_0    rapidsai
dask-cudf                 0.19.1          py37_ga9f345390e_0    rapidsai
dbus                      1.13.6               h48d8840_2    conda-forge
decorator                 5.0.7              pyhd8ed1ab_0    conda-forge
defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
distributed               2021.4.0         py37h89c1867_0    conda-forge
dlpack                    0.3                  he1b5a44_1    conda-forge
entrypoints               0.3             pyhd8ed1ab_1003    conda-forge
expat                     2.3.0                h9c3ff4c_0    conda-forge
faiss-proc                1.0.0                      cuda    rapidsai
fastavro                  1.4.0            py37h5e8e339_0    conda-forge
fastrlock                 0.6              py37hcd2ae1e_0    conda-forge
fontconfig                2.13.1            hba837de_1005    conda-forge
freetype                  2.10.4               h0708190_1    conda-forge
fsspec                    2021.4.0           pyhd8ed1ab_0    conda-forge
future                    0.18.2           py37h89c1867_3    conda-forge
gettext                   0.19.8.1          h0b5b191_1005    conda-forge
gflags                    2.2.2             he1b5a44_1004    conda-forge
giflib                    5.2.1                h36c2ea0_2    conda-forge
glib                      2.68.1               h9c3ff4c_0    conda-forge
glib-tools                2.68.1               h9c3ff4c_0    conda-forge
glog                      0.4.0                h49b9bf7_3    conda-forge
google-cloud-cpp          1.25.0               hc9b7cee_2    conda-forge
graphite2                 1.3.13            h58526e2_1001    conda-forge
greenlet                  1.0.0            py37hcd2ae1e_0    conda-forge
grpc-cpp                  1.37.0               h36de60a_1    conda-forge
gst-plugins-base          1.18.4               hf529b03_2    conda-forge
gstreamer                 1.18.4               h76c114f_2    conda-forge
harfbuzz                  2.8.0                h83ec7ef_1    conda-forge
heapdict                  1.0.1                      py_0    conda-forge
icu                       68.1                 h58526e2_0    conda-forge
idna                      2.10               pyh9f0ad1d_0    conda-forge
importlib-metadata        4.0.1            py37h89c1867_0    conda-forge
ipykernel                 5.5.3            py37h085eea5_0    conda-forge
ipython                   7.22.0           py37h085eea5_0    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
ipywidgets                7.6.3              pyhd3deb0d_0    conda-forge
jedi                      0.18.0           py37h89c1867_2    conda-forge
jinja2                    2.11.3             pyh44b312d_0    conda-forge
joblib                    1.0.1              pyhd8ed1ab_0    conda-forge
jpeg                      9d                   h36c2ea0_0    conda-forge
jpype1                    1.2.1            py37h2527ec5_0    conda-forge
jsonschema                3.2.0              pyhd8ed1ab_3    conda-forge
jupyter_client            6.1.12             pyhd8ed1ab_0    conda-forge
jupyter_core              4.7.1            py37h89c1867_0    conda-forge
jupyterlab_pygments       0.1.2              pyh9f0ad1d_0    conda-forge
jupyterlab_widgets        1.0.0              pyhd8ed1ab_1    conda-forge
kiwisolver                1.3.1            py37h2527ec5_1    conda-forge
krb5                      1.17.2               h926e7f8_0    conda-forge
lcms2                     2.12                 hddcbb42_0    conda-forge
ld_impl_linux-64          2.35.1               hea4e1c9_2    conda-forge
libblas                   3.9.0                8_openblas    conda-forge
libcblas                  3.9.0                8_openblas    conda-forge
libclang                  11.1.0          default_ha53f305_0    conda-forge
libcrc32c                 1.1.1                h9c3ff4c_2    conda-forge
libcudf                   0.19.1          cuda10.2_ga9f345390e_0    rapidsai
libcuml                   0.19.0          cuda10.2_g4cb78ff1a_0    rapidsai
libcumlprims              0.19.0          cuda10.2_ga2abf9f_0    nvidia
libcurl                   7.76.1               hc4aaa36_1    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libevent                  2.1.10               hcdb4288_3    conda-forge
libfaiss                  1.7.0           cuda102hd37495c_8_cuda    conda-forge
libffi                    3.3                  h58526e2_2    conda-forge
libgcc-ng                 9.3.0               h2828fa1_19    conda-forge
libgfortran-ng            9.3.0               hff62375_19    conda-forge
libgfortran5              9.3.0               hff62375_19    conda-forge
libglib                   2.68.1               h3e27bee_0    conda-forge
libgomp                   9.3.0               h2828fa1_19    conda-forge
libhwloc                  2.3.0                h5e5b7d1_1    conda-forge
libiconv                  1.16                 h516909a_0    conda-forge
liblapack                 3.9.0                8_openblas    conda-forge
libllvm10                 10.0.1               he513fc3_3    conda-forge
libllvm11                 11.1.0               hf817b99_2    conda-forge
libnghttp2                1.43.0               h812cca2_0    conda-forge
libntlm                   1.4               h7f98852_1002    conda-forge
libogg                    1.3.4                h7f98852_1    conda-forge
libopenblas               0.3.12          pthreads_h4812303_1    conda-forge
libopus                   1.3.1                h7f98852_1    conda-forge
libpng                    1.6.37               h21135ba_2    conda-forge
libpq                     13.2                 hfd2b0eb_2    conda-forge
libprotobuf               3.15.8               h780b84a_0    conda-forge
librmm                    0.19.0          cuda10.2_g7065af3_0    rapidsai
libsodium                 1.0.18               h36c2ea0_1    conda-forge
libssh2                   1.9.0                ha56f1ee_6    conda-forge
libstdcxx-ng              9.3.0               h6de172a_19    conda-forge
libthrift                 0.14.1               he6d91bd_1    conda-forge
libtiff                   4.2.0                hdc55705_1    conda-forge
libutf8proc               2.6.1                h7f98852_0    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libvorbis                 1.3.7                h9c3ff4c_0    conda-forge
libwebp-base              1.2.0                h7f98852_2    conda-forge
libxcb                    1.13              h7f98852_1003    conda-forge
libxkbcommon              1.0.3                he3ba5ed_0    conda-forge
libxml2                   2.9.10               h72842e0_4    conda-forge
llvmlite                  0.36.0           py37h9d7f4d0_0    conda-forge
locket                    0.2.0                      py_2    conda-forge
lz4-c                     1.9.3                h9c3ff4c_0    conda-forge
markupsafe                1.1.1            py37h5e8e339_3    conda-forge
matplotlib                3.4.1            py37h89c1867_0    conda-forge
matplotlib-base           3.4.1            py37hdd32ed1_0    conda-forge
mistune                   0.8.4           py37h5e8e339_1003    conda-forge
msgpack-python            1.0.2            py37h2527ec5_1    conda-forge
mysql-common              8.0.23               ha770c72_1    conda-forge
mysql-libs                8.0.23               h935591d_1    conda-forge
nbclient                  0.5.3              pyhd8ed1ab_0    conda-forge
nbconvert                 6.0.7            py37h89c1867_3    conda-forge
nbformat                  5.1.3              pyhd8ed1ab_0    conda-forge
nccl                      2.9.6.1              h1a5f58c_0    conda-forge
ncurses                   6.2                  h58526e2_4    conda-forge
nest-asyncio              1.5.1              pyhd8ed1ab_0    conda-forge
netifaces                 0.10.9          py37h5e8e339_1003    conda-forge
nlohmann_json             3.9.1                h9c3ff4c_1    conda-forge
notebook                  6.3.0              pyha770c72_1    conda-forge
nspr                      4.30                 h9c3ff4c_0    conda-forge
nss                       3.64                 hb5efdd6_0    conda-forge
numba                     0.53.1           py37h134767a_0    conda-forge
numpy                     1.20.2           py37h038b26d_0    conda-forge
nvtx                      0.2.3            py37h5e8e339_0    conda-forge
olefile                   0.46               pyh9f0ad1d_1    conda-forge
openjdk                   11.0.9.1             h5cc2fde_1    conda-forge
openjpeg                  2.4.0                hf7af979_0    conda-forge
openssl                   1.1.1k               h7f98852_0    conda-forge
orc                       1.6.7                heec2584_1    conda-forge
packaging                 20.9               pyh44b312d_0    conda-forge
pandas                    1.2.4            py37h219a48f_0    conda-forge
pandoc                    2.12                 h7f98852_0    conda-forge
pandocfilters             1.4.2                      py_1    conda-forge
parquet-cpp               1.5.1                         2    conda-forge
parso                     0.8.2              pyhd8ed1ab_0    conda-forge
partd                     1.2.0              pyhd8ed1ab_0    conda-forge
pcre                      8.44                 he1b5a44_0    conda-forge
pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge
pickleshare               0.7.5                   py_1003    conda-forge
pillow                    8.1.2            py37h4600e1f_1    conda-forge
pip                       21.1               pyhd8ed1ab_0    conda-forge
pixman                    0.40.0               h36c2ea0_0    conda-forge
prometheus_client         0.10.1             pyhd8ed1ab_0    conda-forge
prompt-toolkit            3.0.18             pyha770c72_0    conda-forge
protobuf                  3.15.8           py37hcd2ae1e_0    conda-forge
psutil                    5.8.0            py37h5e8e339_1    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
pyarrow                   1.0.1           py37hb63ea2f_36_cuda    conda-forge
pycparser                 2.20               pyh9f0ad1d_2    conda-forge
pygments                  2.8.1              pyhd8ed1ab_0    conda-forge
pyhive                    0.6.3              pyhd3deb0d_0    conda-forge
pynvml                    8.0.4                      py_1    conda-forge
pyopenssl                 20.0.1             pyhd8ed1ab_0    conda-forge
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pyqt                      5.12.3           py37h89c1867_7    conda-forge
pyqt-impl                 5.12.3           py37he336c9b_7    conda-forge
pyqt5-sip                 4.19.18          py37hcd2ae1e_7    conda-forge
pyqtchart                 5.12             py37he336c9b_7    conda-forge
pyqtwebengine             5.12.1           py37he336c9b_7    conda-forge
pyrsistent                0.17.3           py37h5e8e339_2    conda-forge
pysocks                   1.7.1            py37h89c1867_3    conda-forge
python                    3.7.10          hffdb5ce_100_cpython    conda-forge
python-dateutil           2.8.1                      py_0    conda-forge
python_abi                3.7                     1_cp37m    conda-forge
pytz                      2021.1             pyhd8ed1ab_0    conda-forge
pyyaml                    5.4.1            py37h5e8e339_0    conda-forge
pyzmq                     22.0.3           py37h336d617_1    conda-forge
qt                        5.12.9               hda022c4_4    conda-forge
re2                       2021.04.01           h9c3ff4c_0    conda-forge
readline                  8.1                  h46c0cb4_0    conda-forge
requests                  2.25.1             pyhd3deb0d_0    conda-forge
rmm                       0.19.0          cuda_10.2_py37_g7065af3_0    rapidsai
s2n                       1.0.0                h9b69904_0    conda-forge
sasl                      0.2.1           py37h3340039_1002    conda-forge
scikit-learn              0.24.1           py37h69acf81_0    conda-forge
scipy                     1.6.3            py37h29e03ee_0    conda-forge
send2trash                1.5.0                      py_0    conda-forge
setuptools                49.6.0           py37h89c1867_3    conda-forge
six                       1.15.0             pyh9f0ad1d_0    conda-forge
snappy                    1.1.8                he1b5a44_3    conda-forge
sortedcontainers          2.3.0              pyhd8ed1ab_0    conda-forge
spdlog                    1.7.0                hc9558a2_2    conda-forge
sqlalchemy                1.4.11           py37h5e8e339_0    conda-forge
sqlite                    3.35.5               h74cdb3f_0    conda-forge
tblib                     1.7.0              pyhd8ed1ab_0    conda-forge
terminado                 0.9.4            py37h89c1867_0    conda-forge
testpath                  0.4.4                      py_0    conda-forge
threadpoolctl             2.1.0              pyh5ca1d4c_0    conda-forge
thrift                    0.13.0           py37hcd2ae1e_2    conda-forge
thrift_sasl               0.4.2            py37h8f50634_0    conda-forge
tk                        8.6.10               h21135ba_1    conda-forge
toolz                     0.11.1                     py_0    conda-forge
tornado                   6.1              py37h5e8e339_1    conda-forge
tqdm                      4.60.0             pyhd8ed1ab_0    conda-forge
traitlets                 5.0.5                      py_0    conda-forge
treelite                  1.1.0            py37hc731546_0    conda-forge
treelite-runtime          1.1.0                    pypi_0    pypi
typing_extensions         3.7.4.3                    py_0    conda-forge
ucx                       1.9.0+gcd9efd3       cuda10.2_0    rapidsai
ucx-proc                  1.0.0                       gpu    rapidsai
ucx-py                    0.19.0          py37_gcd9efd3_0    rapidsai
urllib3                   1.26.4             pyhd8ed1ab_0    conda-forge
wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
webencodings              0.5.1                      py_1    conda-forge
wheel                     0.36.2             pyhd3deb0d_0    conda-forge
widgetsnbextension        3.5.1            py37h89c1867_4    conda-forge
xorg-fixesproto           5.0               h7f98852_1002    conda-forge
xorg-inputproto           2.3.2             h7f98852_1002    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.0.10               h7f98852_0    conda-forge
xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
xorg-libx11               1.6.12               h516909a_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h516909a_0    conda-forge
xorg-libxfixes            5.0.3             h516909a_1004    conda-forge
xorg-libxi                1.7.10               h516909a_0    conda-forge
xorg-libxrender           0.9.10            h516909a_1002    conda-forge
xorg-libxtst              1.2.3             h516909a_1002    conda-forge
xorg-recordproto          1.14.2            h516909a_1002    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h7f98852_1002    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
yaml                      0.2.5                h516909a_0    conda-forge
zeromq                    4.3.4                h9c3ff4c_0    conda-forge
zict                      2.0.0                      py_0    conda-forge
zipp                      3.4.1              pyhd8ed1ab_0    conda-forge
zlib                      1.2.11            h516909a_1010    conda-forge
zstd                      1.4.9                ha95c52a_0    conda-forge

Cliff-Lin commented 3 years ago

I've tried fft mode. I can see the running iteration, but it runs 1000x or 10000x slower than the default setting for all sets. It should be weird, right?

mdemoret-nv commented 3 years ago

Thanks for your bug report! I can answer a few of your questions but will need the help of a couple of my colleagues to diagnose the root cause of the problem.

First, can you give us more information to help us reproduce the issue on our end? You mentioned that some feature sets work and others get stuck. Do you have examples of both of these feature sets that we could try?

Regarding your other questions:

Is there any option I can obtain the iterations it has ran?

As far as I know, this is currently not possible. If you are terminating the TSNE.fit() call with Ctrl+C, this will kill the current executing statement that is performing the iteration. New functionality would need to be added to store or write out intermediate iterations that could be viewed after a KeyboardInterrupt is raised.

In the meantime, I would suggest you try:

Enable debug logging
1. With debug logging enabled additional information about the current state of the algorithm will be printed to the log every 100 iterations.
Incrementally increase the number of iterations.
1. By using a fixed seed and iteratively increasing the number of iterations (i.e. n_iter=100, 1000, 2000, ... 10000, etc.) you should be able to get the output at intermediate states.

@cjnolet and @divyegala Do you have any other suggestions or an idea of what could be causing the TSNE algorithm to take longer for certain feature sets?

Cliff-Lin commented 3 years ago

The links below are the sample sets:

short.npy can be processed in a few minutes under the default setting of tSNE while long.npy can not. The feature numbers and dimensions of them are the same. May these files are helpful for you. I will adopt your suggestion to debug it.

lowener commented 2 years ago

I've tried fft mode. I can see the running iteration, but it runs 1000x or 10000x slower than the default setting for all sets. It should be weird, right?

I could not reproduce the slowdown that you experienced with the FFT method. Running the short dataset takes me 55 seconds, the long dataset 65 seconds, and running it with a synthetic dataset from make_blobs or make_classification of the same shape as your dataset takes me 60 seconds, with the FFT method and default parameters. So the FFT method

And to add a few information on the Barnes-Hut method, it is hanging on my side too. After adding logging information I saw that the short dataset was blocked around iteration 678 and the short dataset around iteration 526.

I'm using the current dev version (21.12) with Ubuntu 20.04, CUDA 11.5 and driver 470 on a Quadro RTX 8000.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

rapidsai / cuml

[BUG] Why tSNE is stuck depending on the distribution of data? #3865