ical4j / ical4j

A Java library for parsing and building iCalendar data models
https://www.ical4j.org
BSD 3-Clause "New" or "Revised" License
752 stars 200 forks source link

resource leaking 2.0.2 #226

Open valenpo opened 6 years ago

valenpo commented 6 years ago

Hello, we are expecting some problems with parsing. Some threads are hard working about 6 days. And I have JFR records, that show net.fortuna.ical4j.data.CalendarParserImpl.parse(Reader, ContentHandler) is consuming resources.

Stack Trace Sample Count Percentage(%) net.fortuna.ical4j.data.CalendarBuilder.build(InputStream) 1,255 100 net.fortuna.ical4j.data.CalendarBuilder.build(Reader) 1,255 100 net.fortuna.ical4j.data.CalendarBuilder.build(UnfoldingReader) 1,255 100 net.fortuna.ical4j.data.CalendarParserImpl.parse(Reader, ContentHandler) 1,255 100 net.fortuna.ical4j.data.CalendarParserImpl.parseCalendarList(StreamTokenizer, Reader, ContentHandler) 1,255 100 net.fortuna.ical4j.data.CalendarParserImpl.parseCalendar(StreamTokenizer, Reader, ContentHandler) 1,255 100 net.fortuna.ical4j.data.CalendarParserImpl$PropertyListParser.parse(StreamTokenizer, Reader, ContentHandler) 1,255 100 net.fortuna.ical4j.data.CalendarParserImpl$ComponentParser.access$900(CalendarParserImpl$ComponentParser, StreamTokenizer, Reader, ContentHandler) 1,255 100 net.fortuna.ical4j.data.CalendarParserImpl$ComponentParser.parse(StreamTokenizer, Reader, ContentHandler) 1,255 100 net.fortuna.ical4j.data.CalendarParserImpl$PropertyListParser.parse(StreamTokenizer, Reader, ContentHandler) 1,255 100 net.fortuna.ical4j.data.CalendarParserImpl.access$1300(CalendarParserImpl, StreamTokenizer, Reader) 1,048 83.506 net.fortuna.ical4j.data.CalendarParserImpl.absorbWhitespace(StreamTokenizer, Reader) 1,048 83.506 net.fortuna.ical4j.data.CalendarParserImpl.nextToken(StreamTokenizer, Reader, boolean) 1,048 83.506 java.io.StreamTokenizer.nextToken() 1,022 81.434 java.io.StreamTokenizer.read() 1,004 80 net.fortuna.ical4j.data.UnfoldingReader.read() 999 79.602 java.io.PushbackReader.read() 954 76.016 java.io.FilterReader.read() 875 69.721 java.io.InputStreamReader.read() 863 68.765 sun.nio.cs.StreamDecoder.read() 863 68.765 sun.nio.cs.StreamDecoder.read0() 863 68.765 sun.nio.cs.StreamDecoder.read(char[], int, int) 635 50.598 sun.nio.cs.StreamDecoder.implRead(char[], int, int) 536 42.709 sun.nio.cs.StreamDecoder.readBytes() 232 18.486 com.sun.mail.util.BASE64DecoderStream.read(byte[], int, int) 129 10.279 com.sun.mail.util.BASE64DecoderStream.decode(byte[], int, int) 80 6.375 com.sun.mail.util.BASE64DecoderStream.getByte() 32 2.55 com.sun.mail.util.BASE64DecoderStream.read() 16 1.275 java.nio.charset.CharsetDecoder.decode(ByteBuffer, CharBuffer, boolean) 146 11.633 sun.nio.cs.UTF_8$Decoder.decodeLoop(ByteBuffer, CharBuffer) 119 9.482 sun.nio.cs.UTF_8$Decoder.decodeArrayLoop(ByteBuffer, CharBuffer) 100 7.968 java.nio.charset.CoderResult.malformedForLength(int) 7 0.558 java.nio.charset.CoderResult$Cache.access$200(CoderResult$Cache, int) 7 0.558 sun.nio.cs.StreamDecoder.inReady() 23 1.833 java.nio.charset.CharsetDecoder.reset() 17 1.355 sun.nio.cs.StreamDecoder.ensureOpen() 6 0.478 net.fortuna.ical4j.data.UnfoldingReader.unfold() 43 3.426 net.fortuna.ical4j.util.CompatibilityHints.isHintEnabled(String) 107 8.526 java.util.concurrent.ConcurrentHashMap.get(Object) 107 8.526

screen shot 2018-03-22 at 10 02 50
benfortuna commented 6 years ago

Hi, are you concerned with memory usage or speed of parsing?

The CalendarParserImpl implementation is using the out-dated StreamTokenizer class which I think is not the fastest approach, so perhaps will look for alternatives in future.

valenpo commented 6 years ago

Hi, no it is not about a memory. It looks like UnfoldingReader#unfold has infinitive loop and consuming CPU. For mine build I added a threshold with number

private void unfold() throws IOException { // need to loop since one line fold might be directly followed by another boolean didUnfold; do { didUnfold = false;

        for (int i = 0; i < buffers.length; i++) {
            int read = 0;
            while (read < buffers[i].length) {
                final int partialRead = super.read(buffers[i], read, buffers[i].length - read);
                if (partialRead < 0) {
                    break;
                }
                read += partialRead;
            }
            if (read > 0) {
                if (!Arrays.equals(patterns[i], buffers[i])) {
                    unread(buffers[i], 0, read);
                } else {
                    if (log.isTraceEnabled()) {
                        log.trace("Unfolding...");
                    }
                    linesUnfolded++;
                    didUnfold =  linesUnfolded >= unfoldedTreshhold ? false : true;
                }
            }

// else { // return read; // } } } while (didUnfold); }

On 23 Mar 2018, at 15:48, Ben Fortuna notifications@github.com wrote:

Hi, are you concerned with memory usage or speed of parsing?

The CalendarParserImpl implementation is using the out-dated StreamTokenizer class which I think is not the fastest approach, so perhaps will look for alternatives in future.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ical4j/ical4j/issues/226#issuecomment-375654530, or mute the thread https://github.com/notifications/unsubscribe-auth/ABDFwyMjkL7-_8tIu1_75jLRlh5EGfb7ks5thO8rgaJpZM4S2jHH.

Regards, Valentin Popov

buckett commented 6 years ago

@valenpo Did the threshold fix the issue for you? Do you have a testcase that reproduces the issue?

valenpo commented 6 years ago

@buckett Yup. We are using this in production and never get same infinitive loop again. Sorry, I have no test case. I tried generate test case, but run out of time and make easiest fix, that solve problem.

benfortuna commented 6 years ago

Looking at UnfoldingReader I can see it's possible to get stuck in loop if no chars are read in unfold() method. I'll try to create a test to reproduce issue.

buckett commented 6 years ago

@benfortuna Yeah, if final int partialRead = super.read(buffers[i], read, buffers[i].length - read); reads zero bytes then it gets stuck in a loop, but I don't see how @valenpo fix would help in that situation as it would never make it out of the inner while loop to his escape hatch.

valenpo commented 6 years ago

@buckett it change didUnfold to false after some while iteration and quit the loop. Plus it is production proved, as we have same unprocessed items on queue.

benfortuna commented 6 years ago

I've just added a test of unfolding binary attachments. With small attachments it seems to be ok, but maybe can be used to test some of your own attachments:

https://github.com/ical4j/ical4j/blob/develop/src/test/groovy/net/fortuna/ical4j/data/UnfoldingReaderSpec.groovy