Parser performance: A lot of overhead caused by copying around of substrings

lukaseder commented 6 years ago

A follow up on #1527. This one is far less tricky to address. I think the parser has a major flaw, namely the fact that it operates on a String, copying around substrings all the time, instead of working with a char[] or similar, more primitive data structure. For the sake of argument, I shall continue using char[] as a suitable replacement for String, which it was before JDK 9, when Latin 1 support was added to the String type.

When looking at a JMC / JFR profiling output (see #1527 to reproduce), we can see that a lot of allocation happens in string related stuff:

Let's look at individual stack traces in detail:

`Arrays.copyOfRange(byte[], int, int)`

This particular allocation happens primarly for 2 reasons:

StringBuilders
Substring

Let's look at StringBuilders

It seems that a copy of the input SQL strings is being generated, which is called "plan SQL". I'm sure this is quite useful for debugging purposes, or when the "plan SQL" is actually required. But is it required in the simple case that I've exposed in #1527? I'm sure we could usually work with the original SQL...

Pretty much all of the byte[] allocation that happens in string builders can be traced down to this "plan SQL" feature

Let's look at Substring

The other half of the byte[] allocation happens in two places that call substring:

These are:

Parser.read()
StringUtils.trimSubstring() (which is called in Parser.setSQL())

Parser.read()

This method has my suggested char[] available:

char[] chars = sqlCommandChars;

But for some reason doesn't always use it. For instance, the currentToken is a substring of the sqlCommand, rather than simply two int variables indicating the start and the end of the currentToken inside of the char[]. I know there's a call to upper case inside of Parser.getTokenType(), but that isn't strictly necessary if you write just a slightly more sophisticated equals method to compare tokens case insensitively.

StringUtils.cache()

I'm not quite sure what that's good for. It seems to implement some clever kind of hand written interning to prevent duplicate instances of the same string, but this would probably not be needed anyway if we had been working on the char[] in the first place, rather than copying around substrings. On JDK 11, this cache is the top CPU consumer according to JMC:

StringUtils.trimSubstring()

There's an obvious waste of memory right after calling this method:

    private void setSQL(Prepared command, String start, int startIndex) {
        String sql = StringUtils.trimSubstring(originalSQL, startIndex, lastParseIndex);
        if (start != null) {
            sql = start + " " + sql;
        }
        command.setSQL(sql);
    }

We're extracting a substring from our originalSQL (new byte[] allocation), only to create another string builder to prepend some start token to it. I suspect this could be improved relatively easily.

But again, operating on a globally available char[] would probably be much better here.

katzyn commented 6 years ago

Some thoughts about string-related GC pressure:

From my point of view recursive getSQL() methods can take StringBuilder as parameter.
When I tried to optimize caching of (sub)strings GC pressure was lower, but, unfortunately, performance also was slightly lower at least on Java 9, so I decided not to touch this place.
When I tried to replace int[] array for character types with byte[] array in Parser performance also was lower than with current implementation. Smaller objects are not always faster.
setSQL() can be optimized if it is necessary. Does it appear in CPU profiles?
Situation with upper case conversion is complicated due to different weird configuration options. With default settings unquoted identifiers are converted to upper case immediately. With some flags they aren't converted, but compared with case-insensitive methods. Actually ParserUtil.getTokenType() can be rewritten to work with strings in any case to avoid useless case conversion for keywords, but I never saw high CPU usage by Parser, may be because I use long-living PreparedStatement instances.

grandinj commented 6 years ago

Note that I'm fine with small localised changes to improve perf.

However, I do not want to make the Parser harder to maintain just to satisfy some synthetic test. In my experience the Parser is almost never the bottleneck.

When it is, (like the O(n^2) behaviour we have with really large queries), the problem is actually that we re-parse the same stuff over and over again because of the way TableView works.

lukaseder commented 6 years ago

@katzyn From my point of view recursive getSQL() methods can take StringBuilder as parameter.

That would certainly help

@katzyn When I tried to replace int[] array for character types with byte[] array in Parser performance also was lower than with current implementation. Smaller objects are not always faster.

How did you measure this? In any case, the JVM doesn't really know a byte type. So even if the array is slower, the actual type on the stack is the same: int

@katzyn setSQL() can be optimized if it is necessary. Does it appear in CPU profiles?

No, it's not a CPU problem but an allocation problem - although definitely not the biggest one in this benchmark

@grandinj However, I do not want to make the Parser harder to maintain just to satisfy some synthetic test.

The idea of the benchmark as it was written is that when an application has tons of dynamic SQL, then caching the parser output will not work and a "hard parse" (Oracle speak) is inevitable. Of course, hard parses should be avoided in general, but given that a substantial part of the work the parser does seems non-essential, I still think it is worth looking at this in detail - maybe not as a priority.

lukaseder commented 6 years ago

... having said so, I definitely think #1527 is a much more low hanging fruit than this issue here.

katzyn commented 6 years ago

How did you measure this?

JMH benchmarks with a lot of different SQL commands from different sources.

I also tried to run them without JMH, results were unstable, but showed similar difference.

Java has packed byte[] arrays and usually works fast enough with them. But not here. Difference was low enough, however.

I should notice that this is not an array with characters, this is an array with their types. It's a largest object that parser allocates during initialization with some string.

lukaseder commented 6 years ago

I should notice that this is not an array with characters, this is an array with their types. It's a largest object that parser allocates during initialization with some string.

I've seen it: the Parser.characterTypes array, right? But I don't think that it really matters much here. That array is a per-parse allocation, not a per query element allocation, like the string copying that I've mentioned in this issue.

In short:

The benchmark parses N distinct queries
Each query contains M "query elements"

Your array allocation is O(N) in the benchmark, which is reasonable. There's some inevitable overhead to a single parse operation.

But string copying is O(N * M), which is why it bubbled up in the profiler results - and the query is still relatively simple.

katzyn commented 6 years ago

String literals and identifiers need own strings anyway. I removed string allocation for operators and similar tokens ( +, (, , etc) some time ago. We still have unnecessary string allocation for keywords, but impact most likely not very high. There are also a lot of grammar elements that are not actually a keywords in H2 and have special meaning only in some positions, they are parsed as identifiers, but may be handled in special way.

katzyn commented 6 years ago

@lukaseder Is it still an issue for you with the current master branch?

Some suggestions mentioned here actually reduce overall performance of H2 and were not implemented. SQL statements are still generated from the internal structures. But there were some optimizations in SQL construction and in other places. They speed up construction of PreparedStatement and some internal objects.

lukaseder commented 6 years ago

Thanks for the various fixes, @katzyn. The int[] allocation that was made due to the unnecessary usage of regular expressions went down significantly - this is definitely a very good thing.

Other than that, profiling doesn't show any visually significant difference (not measuring wall clock time). I could run a JMH benchmark later on this week, though?

lukaseder commented 6 years ago

I could run a JMH benchmark later on this week, though?

Egh. I'm afraid I cannot seem to allocate time for this right now - sorry

grandinj commented 5 years ago

I think @katzyn has largely addressed this

h2database / h2database

Parser performance: A lot of overhead caused by copying around of substrings #1528

`Arrays.copyOfRange(byte[], int, int)`