google / guava

Google core libraries for Java
Apache License 2.0
50.14k stars 10.89k forks source link

ImmutableBiMap.copyOf quadratic running time behavior #3015

Closed MrVPlusOne closed 6 years ago

MrVPlusOne commented 6 years ago

Hi,

I found an input pattern that can trigger Ω(N^2) running time behavior of ImmutableBiMap.copyOf, the input is constructed and tested using the following code:

import org.apache.commons.lang3.tuple.ImmutablePair;
import com.google.common.collect.ImmutableBiMap;

import java.util.ArrayList;
import java.util.Random;

public class GuavaImmutableBiMap {

    public static void main(String[] args) {
        ArrayList<ImmutablePair<Integer, Integer>> pattern = generateInput(600);
        ArrayList<ImmutablePair<Integer, Integer>> randomInput = randomInput(600);

        int testNum = 4000;
        long start;
        double timeUse;

        start = System.nanoTime();
        for(int i = 0; i < testNum; i++) {
            ImmutableBiMap.copyOf(randomInput);
        }
        timeUse = (System.nanoTime() - start)/1e9;
        System.out.println("Random input time use: " + timeUse + "s.");

        start = System.nanoTime();
        for(int i = 0; i < testNum; i++) {
            ImmutableBiMap.copyOf(pattern);
        }
        timeUse = (System.nanoTime() - start)/1e9;
        System.out.println("Pattern time use: " + timeUse + "s.");
    }

    public static ArrayList<ImmutablePair<Integer, Integer>> generateInput(int size){
        int s1 = 475;
        ImmutablePair<Integer, Integer> s2 = new ImmutablePair<>(271,212);
        ArrayList<ImmutablePair<Integer, Integer>> s4 = new ArrayList<>();

        s4.add(s2);

        for (int i = 0; i < size; i ++){
            s1 = ((99*302*486) | 475) + 142 + (s1-1);
            s2 = new ImmutablePair<>(s2.right, s1 << 353);
            s4.add(s2);
        }

        return s4;
    }

    public static ArrayList<ImmutablePair<Integer, Integer>> randomInput(int size){
        Random rand = new Random(1);

        ArrayList<ImmutablePair<Integer, Integer>> list = new ArrayList<>();
        for (int i = 0; i < size; i ++){
            list.add(new ImmutablePair<>(rand.nextInt(), rand.nextInt()));
        }

        return list;
    }
}

On my machine, when running with -Xint, I got the following result:

Random input time use: 4.183300757s.
Pattern time use: 243.795791197s.

It's about 60X slowdown.

When running without -Xint and increase testNum from 4000 to 40000 (to reduce noise), the result was:

Random input time use: 0.816171511s.
Pattern time use: 38.315509883s.

I also tried to fit the performance-inputSize relationship, the result is a quadratic curve:

fit-bimap

Note that this input pattern I found only have this quadratic behavior when size < 600, but this is due to a current technical limitation of our fuzzing tool used to find this input pattern, and we suspect there exist patterns that can scale to larger sizes.

lowasser commented 6 years ago

It looks like you're creating a set of inputs with deliberately adversarial hash codes. We knowingly didn't design the immutable collections to deal with these. Do you have a case where these data structures degenerate on not obviously deliberately adversarial input?

MrVPlusOne commented 6 years ago

No, this input pattern was automatically constructed using our research tool. I just think this might imply some potential security vulnerabilities or performance issues. Our tool found this input pattern in a black-box fashion and only used running time as feedback, so we are suspecting that, in other applications that use these immutable collections, an adversarial may use this kind of black-box fuzzing techniques to construct malicious attacks. There can be ways to defend against this kind of attacks. One solution would be to use a randomized smear function that changes its seed during program start up or periodically. But any other method that eliminates a fixed malicious input pattern should also work.

Maaartinus commented 6 years ago

There can be ways to defend against this kind of attacks.

There's no solution which is secure, fast and not very complicated. Randomized smearing is either vulnerable (e.g., sun.misc.Hashing.murmur3_32 in Java 7) or rather slow. While SipHash was designed to counter the HashDos attack, it's still considerably slower than Hashing.smear. Moreover, for Strings, it's trivial to generate full collisions, so you'd need more than just smearing the existing hash. Using TreeNodes like HashMap since Java 8 does is probably fine, but pretty complicated.

swankjesse commented 6 years ago

FYI: https://github.com/square/moshi/blob/dac5f695b3ddae9ee72b4172ee6b52805656c342/moshi/src/main/java/com/squareup/moshi/LinkedHashTreeMap.java

https://arstechnica.com/information-technology/2011/12/huge-portions-of-web-vulnerable-to-hashing-denial-of-service-attack/