konsoletyper / teavm

Compiles Java bytecode to JavaScript, WebAssembly and C
https://teavm.org
Apache License 2.0
2.55k stars 261 forks source link

Speed up equals for Double and Float in JS #826

Closed Ihromant closed 8 months ago

Ihromant commented 8 months ago

Explanation:

public static void main(String[] args) {
        Double[] array = Stream.generate(() -> Double.longBitsToDouble(ThreadLocalRandom.current().nextLong()))
                .limit(10_000_000).toArray(Double[]::new);
        int counter = 0;
        for (int i = 0; i < array.length - 1; i++) {
            counter += array[i].equals(array[i + 1]) ? 0 : 1;
        }
        long time = System.currentTimeMillis();
        for (int i = 0; i < array.length - 1; i++) {
            counter += array[i].equals(array[i + 1]) ? 0 : 1;
        }
        System.out.println("Time " + (System.currentTimeMillis() - time) + " counter " + counter);
    }

this simple benchmark shows: Chrome: Old: Time 406 counter 15001524 New: Time 94 counter 14991638 Firefox: Old: Time 2390 counter 14991538 New: Time 142 counter 14990878 So, in JS we got 4 to 15 speed improvement by this simple change.

konsoletyper commented 8 months ago
  1. What about Float? Did you do benchmark
  2. I think another approach can be taken for Double: use Float64Array to store this and that values in a buffer, then extract them using Int32Array and compare 32-bit parts. I'm not sure, but this may increase performance
Ihromant commented 8 months ago
  1. I didn't do benchmark for Float, but I assume that speedup will be lower, but still 2-7.
  2. I don't think this will increase performance. Current toLongBits puts it in ArrayBufferView and then extract it. Proposed approach is similar to current implementation. Other issue here: NaN case, so code won't be simple.
konsoletyper commented 8 months ago

but I assume that speedup will be lower, but still 2-7.

Please, don't assume!

I don't think this will increase performance. Current toLongBits puts it in ArrayBufferView and then extract it.

Don't forget also that it deals with BigInt, which is slow by itself.

Other issue here: NaN case, so code won't be simple.

What do you mean? if (isNaN(value) && isNaN(that.value)) return true?

Ihromant commented 8 months ago

Updated benchmark for floats. Chrome: Old: Doubles 444 counter 24997087 Floats 90 counter 34996938 New: Doubles 93 counter 24995714 Floats 88 counter 34995576 Firefox: Old: Doubles 2458 counter 24995261 Floats 314 counter 34995094 New: Doubles 144 counter 25000839 Floats 141 counter 35000696 Assumption was wrong. Still, in Chrome floats have approx. same speed while in Firefox change gives 2-times speedup, so improvement has benefit.

konsoletyper commented 8 months ago

I suggest that you increase number of iterations by factor of at least 50, so the results will be relevant. Also I suggest to measure several times.

konsoletyper commented 8 months ago

Another suggestion: in your array you have only few equal pairs. What if you increase this number? In this case comparison won't always follow fast path.

Ihromant commented 8 months ago

Can't increase number of test cases. It starts crashing tabs even on 100_000_000. Still, added equal pairs.

public class Client {
    private static final int TEST_COUNT = 10_000_000;
    public static void main(String[] args) {
        Double[] array = Stream.generate(() -> Double.longBitsToDouble(ThreadLocalRandom.current().nextLong()))
                .limit(TEST_COUNT).toArray(Double[]::new);
        Float[] fArray = Stream.generate(() -> Float.intBitsToFloat(ThreadLocalRandom.current().nextInt()))
                .limit(TEST_COUNT).toArray(Float[]::new);
        IntStream.generate(() -> ThreadLocalRandom.current().nextInt(array.length)).limit(array.length / 4).forEach(i -> {
            array[i] = 5.0;
            fArray[i] = 3.0f;
        });
        int counter = 0;
        for (int i = 0; i < array.length - 1; i++) {
            counter += array[i].equals(array[i + 1]) ? 0 : 1;
            counter += fArray[i].equals(fArray[i + 1]) ? 0 : 1;
        }
        long time = System.currentTimeMillis();
        for (int i = 0; i < array.length - 1; i++) {
            counter += array[i].equals(array[i + 1]) ? 0 : 1;
        }
        System.out.println("Doubles " + (System.currentTimeMillis() - time) + " counter " + counter);
        time = System.currentTimeMillis();
        for (int i = 0; i < fArray.length - 1; i++) {
            counter += fArray[i].equals(fArray[i + 1]) ? 0 : 1;
        }
        System.out.println("Floats " + (System.currentTimeMillis() - time) + " counter " + counter);
    }
}

Results: Chrome: Old: Doubles 497 counter 25493442 Floats 154 counter 35003106 Doubles 518 counter 25494665 Floats 154 counter 35005594 Doubles 505 counter 25496297 Floats 153 counter 35007874 New: Doubles 167 counter 25496649 Floats 154 counter 35007636 Doubles 167 counter 25499776 Floats 154 counter 35010534 Doubles 170 counter 25501053 Floats 156 counter 35012212 Firefox: Old: Doubles 4798 counter 25497246 Floats 618 counter 35007078 Doubles 4829 counter 25500967 Floats 578 counter 35011228 Doubles 4775 counter 25501696 Floats 582 counter 35013302 New: Doubles 393 counter 25499172 Floats 372 counter 35010210 Doubles 408 counter 25498895 Floats 404 counter 35010110 Doubles 392 counter 25499503 Floats 380 counter 35010690

Same conclusion, same speed for floats in Chrome, but improvement for other cases.

konsoletyper commented 8 months ago

You don't need to waste memory. You can wrap working loops and repeat then, say, 10 times

Ihromant commented 8 months ago

wrapped to 10x shifted loop according to your suggestions. Chrome: Old: Doubles 8699 counter 89436753 Floats 1333 counter 175029954 New: Doubles 1648 counter 89444462 Floats 1380 counter 175039089 Firefox: Old: Doubles 44808 counter 89415527 Floats 5434 counter 175012510 New: Doubles 3833 counter 89448561 Floats 3650 counter 175043595 Same conclusion.

konsoletyper commented 8 months ago

I made my own optimizations and it looks like Double.equals works even faster than your implementation. Float.equals still works slower in Firefox, but not that much slower. You can try to take my commit here and do your own tests. Surprisingly, WebAssembly shows a bit worse results for Double.equals. Perhaps, it's due to slower instanceof/cast.

Ihromant commented 8 months ago

Nicely done. Closing this as not needed.