maxmind / MaxMind-DB-Reader-java

Java reader for the MaxMind DB format
https://maxmind.github.io/MaxMind-DB-Reader-java/
Apache License 2.0
113 stars 43 forks source link

ArrayIndexOutOfBoundsException in tryCity method from maxmind #68

Closed edelgadoh closed 4 years ago

edelgadoh commented 4 years ago

Dear maxmind developers,

We are using the last lib

   <dependency>
        <groupId>com.maxmind.geoip2</groupId>
        <artifactId>geoip2</artifactId>
        <version>2.13.0</version>
    </dependency>

From time to time we observed some errors from maxmind lib, you can see the stacktrace below:

"stack_trace":"java.lang.ArrayIndexOutOfBoundsException: Index 34 out of bounds for length 16\n\tat com.maxmind.db.Decoder$Type.get(Decoder.java:53)\n\tat com.maxmind.db.Decoder.decode(Decoder.java:129)\n\tat com.maxmind.db.Decoder.decode(Decoder.java:88)\n\tat com.maxmind.db.Reader.resolveDataPointer(Reader.java:256)\n\tat com.maxmind.db.Reader.getRecord(Reader.java:176)\n\tat com.maxmind.geoip2.DatabaseReader.get(DatabaseReader.java:271)\n\tat com.maxmind.geoip2.DatabaseReader.tryCity(DatabaseReader.java:334)

As we analysed the code, the error starts in Decoder.class, specifically in this part:

static enum Type { EXTENDED, POINTER, UTF8_STRING, DOUBLE, BYTES, UINT16, UINT32, MAP, INT32, UINT64, UINT128, ARRAY, CONTAINER, END_MARKER, BOOLEAN, FLOAT;

    static final Decoder.Type[] values = values();

    private Type() {
    }

    static Decoder.Type get(int i) {
        return values[i];
    }

sometimes the i parameter is 36, but there are only 16 fixed values in the ENUM.

Thanks

oschwald commented 4 years ago

It sounds like your database is corrupt.

edelgadoh commented 4 years ago

But I works successfully for about 12K requests by minute in PRODUCTION, and generates only for about 50 errors in 1 minute, why do you say it sounds like it is corrupted?

oschwald commented 4 years ago

The fact that it is intermittent supports the idea that it is corrupted. Although the error handling could be improved in this case to throw a nicer exception, what is happening is that it is hitting an unknown and invalid data type in the file while decoding a record. Are you storing the database in a JAR file? If so, please ensure you have binary filtering disabled.

edelgadoh commented 4 years ago

I got it, right now we are storing this maxmind database in a shared volume in kubernetes and not inside a JAR file. Yeah, I agree that if the maxmind lib could throw a checked exception it'll be better to us to catch it in the correct way. Well for now, until a possible improvement, we are going to catch the "runtime exception" in our code, thank you very much!

edelgadoh commented 4 years ago

@oschwald I have a doubt, is going to be created any task to improve this exception treatment?

oschwald commented 4 years ago

Yes, we plan on improving the exception to throw an InvalidDatabaseException under this condition. However, you should not be seeing this issue with a valid MMDB file. Under what conditions are you seeing this error?

edelgadoh commented 4 years ago

Hi Gregory, great to know that!

As I commented before, we are using a shared volume in kubernetes, and we read the maxmind files from the directory below. We have two jar files that are completely independent and only read from the path to create a bean. image Another point is that these files are updated from time to time by another team, and we are suspecting about the cache from maxmind, but it was not conclusive yet.

As I commented before, it's working almost ok for about 98% of each request, and only 2% of the request were generating a runtime exception, but for now we try and catch this runtime exception.

We are going to inspect it in more detail, Thank you!

oschwald commented 4 years ago

If the files are not replaced atomically on update, you would very likely see errors such as this as well as various InvalidDatabaseException errors.

edelgadoh commented 4 years ago

@oschwald yeah, the files are replaced atomically.

we reproduced the error in local environment, how to reproduce: 1) make two sets of MindMind files, the first dataset has old files, the second dataset has new files 2) configure JMeter to call the MaxMind API with many ips during 3 minutes 3) while the process in 2 is running, replace between first and second dataset 4) in our tests, we saw errors in logs related to MaxMind database corrupted

In our system, we were using the MaxMind read option: MEMORY_MAP (https://github.com/maxmind/MaxMind-DB-Reader-java#usage) ... after change it to MEMORY option, we couldn't reproduce again this error in our local environment.

oschwald commented 4 years ago

Can you share the exact mechanism used for (3) and the OS and file system?

edelgadoh commented 4 years ago

The exact mechanism used in (3) is:

We used JMeter to make a GET request to one fixed endpoint that uses MaxMind (using SpringBoot and Java Amazon Corretto 11), we configured JMeter to create 100 threads, making an interval of 5 seconds to wait and repeat it 3 times. The all process takes for about 15 seconds.

During this 15 seconds of execution, we replaced between the old and new MaxMind databases (using the terminal we made a cp -r /tmp/oldMaxmind/ /project/maxmind-files/ and then a cp -r /tmp/newMaxmind/ /project/maxmind-files/ at least 6 times)

My notebook is an Ubuntu 19: Linux NI-45848-1 5.3.0-050300-generic #201909152230 SMP Sun Sep 15 22:32:54 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Thank you!

oschwald commented 4 years ago

cp is not atomic, which explains the issue. If you want to atomically replace the file, you will need to move it to the save filesystem and then mv the new file over the old file.