Closed dkobak closed 3 years ago
I found where it came up before: https://github.com/KlugerLab/FIt-SNE/issues/67. However, it stayed unresolved there (I was not motivated enough to investigate and closed that issue without diagnosing the problem).
Thanks for the CC. My comments on that thread apply here--it's possible you are bouncing between number of interpolation nodes. My first step in diagnosing would be to print the number of nodes, embedding size, and duration of time at each iteration.
When you run that many iterations with a large step size, the embedding becomes really large--and we never really tested it in that setting. It may be that the preset number of nodes is not appropriate for that size of an embedding.
Hi George. The possible number of interpolation nodes is quite finely distributed though:
cdef list recommended_boxes = [
25, 36, 50, 55, 60, 65, 70, 75, 80, 85, 90, 96, 100, 110, 120, 130, 140, 150, 175, 200
]
-- can it be that bouncing back and forth between two neighbouring values affects the runtime by 2x?
My first step in diagnosing would be to print the number of nodes, embedding size, and duration of time at each iteration.
Makes sense. I might give it a try!
-- can it be that bouncing back and forth between two neighbouring values affects the runtime by 2x?
You're right--seems strange--but it's the first thing I'd rule out.
The other aspect that is strange is that these times are averaged over 50 iterations. If it was just on the border, bouncing between sizes with every few iterations, that effect should average out.
I am afraid this won't clarify much, but here is a plot of embedding span (max-min) on every iteration and runtime per iteration.
Waaait a second! The runtime starts behaving wildly after embedding span crosses 200. And that's exactly the last value in the recommended_boxes
! Can this be the culprit???
So I changed the list to
25, 36, 50, 60, 70, 75, 80, 90, 96, 100, 120, 140, 150, 175, 200,
250, 300, 350, 400, 450, 500, 1000, 5000, 10000
(I took out values that were multiples of 11 and 13 because I wasn't sure if they don't slow FFTW down and there was also 85 in the list which has factor 17...). Here is the result:
It does look a bit better (smoother?) than before, but the weird steps after the span crosses ~200 are still there.
What's weird is that those steps seem all to have length 18 iterations...
Apologies for spamming everybody, but turns out I updated the recommended_boxes
list in the function for 1d FFT but not for 2f FFT. Facepalm. Now I updated it in both places and rerun. Voilà:
Incidentally, can it be that the interpolation params can be relaxed once the embedding becomes very large (e.g. span larger than [-100,100]) so that optimisation runs faster without -- perhaps! -- compromising the approximation too much?
Last thing -- and then I am off for today. I tried to clip the number of boxes to 100, i.e. if the recommended value was above 100 then I still set it to 100. This made the KL divergence decrease much slower than without clipping. So I conclude that it's a bad idea.
I don't think these are good sizes for the boxes. From the FFTW docs:
FFTW is best at handling sizes of the form 2^a 3^b 5^c 7^d 11^e 13^f, where e+f is either 0 or 1, and the other exponents are arbitrary. Other sizes are computed by means of a slow, general-purpose algorithm (which nevertheless retains O(n log n) performance even for prime sizes). (It is possible to customize FFTW for different array sizes; see Installation and Customization.) Transforms whose sizes are powers of 2 are especially fast, and it is generally beneficial for the last dimension of an r2c/c2r transform to be even.
So, we may just want to generate a larger list of predefined numbers that fit this formula. E.g.
l = set()
for a in range(10):
for b in range(10):
for c in range(10):
for d in range(10):
l.add(2**a * 3**b * 5**c * 7**d * 11**0 * 13**0)
l.add(2**a * 3**b * 5**c * 7**d * 11**0 * 13**1)
l.add(2**a * 3**b * 5**c * 7**d * 11**1 * 13**0)
l = [x for x in l if x >= 20 and x <= 1000] # filter out the really small ones
print(sorted(l))
[20, 21, 22, 24, 25, 26, 27, 28, 30, 32, 33, 35, 36, 39, 40, 42, 44, 45, 48, 49, 50, 52, 54, 55, 56, 60, 63, 64, 65, 66, 70, 72, 75, 77, 78, 80, 81, 84, 88, 90, 91, 96, 98, 99, 100, 104, 105, 108, 110, 112, 117, 120, 125, 126, 128, 130, 132, 135, 140, 144, 147, 150, 154, 156, 160, 162, 165, 168, 175, 176, 180, 182, 189, 192, 195, 196, 198, 200, 208, 210, 216, 220, 224, 225, 231, 234, 240, 243, 245, 250, 252, 256, 260, 264, 270, 273, 275, 280, 288, 294, 297, 300, 308, 312, 315, 320, 324, 325, 330, 336, 343, 350, 351, 352, 360, 364, 375, 378, 384, 385, 390, 392, 396, 400, 405, 416, 420, 432, 440, 441, 448, 450, 455, 462, 468, 480, 486, 490, 495, 500, 504, 512, 520, 525, 528, 539, 540, 546, 550, 560, 567, 576, 585, 588, 594, 600, 616, 624, 625, 630, 637, 640, 648, 650, 660, 672, 675, 686, 693, 700, 702, 704, 720, 728, 729, 735, 750, 756, 768, 770, 780, 784, 792, 800, 810, 819, 825, 832, 840, 864, 875, 880, 882, 891, 896, 900, 910, 924, 936, 945, 960, 972, 975, 980, 990, 1000]
Could you try out this list of numbers instead? The docs indicate that they should work faster.
Sure I can use this -- it's probably better. It's just that as you can see from my yesterday's experiments, values like 201, 211, 213, etc. work slower (sometimes much slower) than 250. So I thought that fine grid is not really necessary here. But it certainly won't hurt either!
Also, let's maybe cap it at 1000, so if the number of boxes wants to be above 1000, I'll just set it to 1000...
Here is how it looks:
Does not look like a staircase anymore.
Hmm, I think this looks better, no? There are still a few spikes, but those might be explained away due to other things your system is doing. Anyways, does this fix your problem?
I have played around a bit with changing the number of interpolation points, but I think the current setting is very reasonable. For example, if I fixed the number of intervals to e.g. 50 and have the intervals expand with the embedding, the final embedding would exhibit banding at the box boundaries. 100 intervals seemed to work fine, but I let it run for a while so the embedding would have a larger span, and I saw some banding there as well. Interestingly enough, the banding was more severe in the standard perplexity based affinities than uniform affinities (or it could be the other way around -- I forgot).
My feeling is that we might be able to play with the number of intervals and fine tune it, but the gains would be marginal. The current setting seems pretty good.
I agree.
Yes, the PR does fix the problem.
I have been doing some experiments on convergence and running t-SNE for many more iterations than I normally do. And I again noticed something that I used to see every now and then: the runtime jumps wildly between "epochs" of 50 iterations. This only happens when the embedding is very expanded and so FFT gets really slow. Look:
For the record, this is on full MNIST with uniform k=15 affinity,
n_jobs=-1
. Note that after it gets to 30 seconds / 50 iterations, it starts fluctuating between 30 and 60. This does not make sense.I suspect it may be related to how interpolation params are chosen depending on the grid size. Can it be that those heuristics may need improvement?
Incidentally, can it be that the interpolation params can be relaxed once the embedding becomes very large (e.g. span larger than [-100,100]) so that optimisation runs faster without -- perhaps! -- compromising the approximation too much?
CCing to @linqiaozhi.