lbehnke / h2database

Automatically exported from code.google.com/p/h2database
0 stars 0 forks source link

Errors with CLOBs with length > Integer.MAX_VALUE #100

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Code bug in:
org/h2/src/main
org.h2.value.ValueLob.java : line 160-170
   Function: createClob(Reader,length,DataHandler)

Category: handling very large CLOBs

What steps will reproduce the problem?
1. Attempt to create & insert a CLOB from a Reader, with more than
Integer.MAX_VALUE (2^31-1) characters in the Reader.

2. Select this CLOB, and attempt to read its stream.

2. Optional: recreate the original stream and compare it character by
character against the CLOB contents.  

What is the expected output? 
A CLOB with the entire stream contents (more than 2 billion characters),
identical to the original input.

What do you see instead?
I'm not sure, but it isn't right.  Probably an OutOfMemoryError, on a
32-bit OS.  On a 64-bit OS, probably get just the first part
of the string, based on org.h2.util.IOUtils.readStringAndClose(Reader,length)
returning up to Integer.MAX_VALUE characters.  However, block length limits
might cause other problems. Could throw an ArrayIndexOutOfBoundsException
somewhere too.

Source of problem:
ValueLob.java Line 162: test will never be true, unless Buffersize ==
Integer.MAX_VALUE.  I believe that this will not occur in all cases,
depending on settings. In either case you're still reading a HUGE file to
memory.

Suggested Test (slow):
1) Create a Reader implementation linked to a java.util.Random with a
specified seed & length (long). 
2) Attempt to insert CLOBs to DB using this (both limited & unlimited
length insertion forms).
3) Attempt to SELECT the big CLOBs from DB, then get streams for both and
compare bytewise to the random input reader (recreated with same seed)

Other Concerns:
There may be additional bugs related to integer vs. long lengths for LOBs.
 org.h2.store.DataPage.java appears to be using ints to store LOB
sizes, for example.  This is probably okay if Integer.MAX_VALUE is handled 
fine, otherwise it is a problem.

Comments:
CLOBs over 4 GB probably aren't common. 

Original issue reported on code.google.com by buckyba...@gmail.com on 21 Jul 2009 at 7:33

GoogleCodeExporter commented 9 years ago
Unless someone runs into problems related to this, it probably isn't urgent.  I
haven't traced all execution paths, so another safety check may prevent this. 

Still, reading all of a stream to a String seems dubious, and may account for
OutOfMemoryError I received when working with lots (20-250K rows)of largish
(20-100KB) LOBs.

Original comment by buckyba...@gmail.com on 21 Jul 2009 at 7:42

GoogleCodeExporter commented 9 years ago
Keep in mind that byte[] are limited to Integer.MAX_VALUE in Java too.

Original comment by Suran...@gmail.com on 22 Jul 2009 at 8:56

GoogleCodeExporter commented 9 years ago
There are problems with CLOBs larger than Integer.MAX_VALUE, but
not where you described. I will fix them for the next release.
The problems are: the maximum precision of CLOB and BLOB is
Interger.MAX_VALUE. The functions LENGTH, OCTET_LENGTH, and 
BIT_LENGTH return int instead of long. 

> ValueLob.java Line 162
This is just the in-memory _buffer_. Large LOBs are not fully read
to the buffer.

> org.h2.store.DataPage.java appears to be using ints
No, it uses long: writeLong(lob.getPrecision());

My test case is:

import java.io.Reader;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.Statement;
import org.h2.tools.DeleteDbFiles;

public class TestClobReader {
    public static void main(String[] args) throws Exception {
        Class.forName("org.h2.Driver");
        DeleteDbFiles.execute("data/test", "test", true);
        Connection conn = DriverManager.getConnection(
                "jdbc:h2:data/test/test", "sa", "sa");
        Statement stat = conn.createStatement();
        stat.execute("set COMPRESS_LOB LZF");
        stat.execute("create table test(d clob)");
        PreparedStatement prep = conn.prepareStatement(
                "insert into test values(?)");
        prep.setCharacterStream(1, new Reader() {
            long remaining = Integer.MAX_VALUE + 110L;
            int p;
            public void close() {
                // ignore
            }
            public int read(char[] cbuf, int off, int len) {
                len = (int) Math.min(remaining, len);
                remaining -= len;
                if ((p++ & 1023) == 0) {
                    System.out.println(remaining);
                }
                return len == 0 ? -1 : len;
            }
        }, -1);
        prep.executeUpdate();
        ResultSet rs = stat.executeQuery(
                "select length(d) from test");
        rs.next();
        System.out.println(rs.getLong(1));
        rs = stat.executeQuery("select d from test");
        rs.next();
        Reader reader = rs.getCharacterStream(1);
        char[] buff = new char[4 * 1024];
        long length = 0;
        while (true) {
            int len = reader.read(buff);
            if (len < 0) {
                break;
            }
            length += len;
        }
        System.out.println(length);
        conn.close();
    }
}

Original comment by thomas.t...@gmail.com on 30 Jul 2009 at 2:01

GoogleCodeExporter commented 9 years ago
Should be fixed in the version 1.1.116

Original comment by thomas.t...@gmail.com on 10 Aug 2009 at 3:35