mreutegg / laszip4j

The LASzip library ported to Java
GNU Lesser General Public License v2.1
34 stars 15 forks source link

Performance public LASReader(File file) vs public static Iterable<LASPoint> getPoints(InputStream is) #102

Closed oers closed 9 months ago

oers commented 9 months ago

Today I noticed a heavy performance difference between parsing via public LASReader(File file) and public static Iterable getPoints(InputStream is)

My File is a 2.1GB Large laz file in version 1.3 and has 400 Million Points. file.length is : 2156189931

With public static Iterable getPoints(InputStream is) parsing takes about 6 Minutes. This creates a simple BufferedInputStream from my FileInputStrean.

Parsing in ms: 353.620 ms

With public LASReader(File) it takes much longer. 3 hours 21 Minutes

Parsing in ms: 12.046.417 ms

Perhaps the difference lies in ByteStreamInFile:

private static RandomAccessDataInput createRandomAccessDataInput(RandomAccessFile file) {
        long length;
        try {
            length = file.length();
        } catch (IOException e) {
            throw new UncheckedIOException(e);
        }
        if (length > Integer.MAX_VALUE) {
            return new RandomAccessFileDataInput(file);
        } else {
            return new MMappedDataInput(file);
        }
    }
mreutegg commented 9 months ago

This is somewhat expected, because files bigger than 2 GB cannot be memory mapped in a single ByteBuffer. The implementation uses RandomAccessFileDataInput when a file is bigger than 2 GB. This class does not implement buffered reads, hence the poor performance.

mreutegg commented 9 months ago

PR https://github.com/mreutegg/laszip4j/pull/104 adds support for memory mapping files bigger than 2 GB.

Can you give it a try and let me know if it works for you?

oers commented 9 months ago

Is the snapshot available in maven repo? Or do I need to build this myself. Will test this gladly.

mreutegg commented 9 months ago

No, there is no snapshot build available. You will have to build it from the branch.

oers commented 9 months ago

Maven build reports a test error (see pull request, I commented there). Performance seems better (still running some tests).

oers commented 9 months ago

Performance seems comparable between bufferinput and the new one. DId some Tests and in most cases both take around the same time. Buffered Inputstream "seems" to be conistently faster at about 10 to 25 seconds.

oers commented 9 months ago

grafik laz() is via BufferedInputStream and lazWithConsstructor is via Constructtor

mreutegg commented 9 months ago

Buffered Inputstream "seems" to be conistently faster at about 10 to 25 seconds.

I don't have a big las file handy, but added a simple performance test to https://github.com/mreutegg/laszip4j/pull/104 with https://github.com/mreutegg/laszip4j/pull/104/commits/600195dde02fc4c2d0a80da6e48099e24df43fbf. On my machine (Linux with Java 11), I get the following numbers for a 366 MB las file.

Using InputStream:

DescriptiveStatistics:
n: 9
min: 833.0
max: 872.0
mean: 846.0
std dev: 12.072696467649637
median: 840.0
skewness: 1.418264262531904
kurtosis: 1.8458009190395268

Using memory mapped file (buffer size 100 MB):

DescriptiveStatistics:
n: 9
min: 606.0
max: 634.0
mean: 619.7777777777778
std dev: 8.467257197253693
median: 620.0
skewness: 0.022892190184662018
kurtosis: -0.19965745051562234

Memory mapping the las file is slightly faster.

oers commented 9 months ago

The only difference seems to be that I used LAZ (compressed). But the change looks good :)

mreutegg commented 9 months ago

Merged https://github.com/mreutegg/laszip4j/pull/104

mreutegg commented 9 months ago

The improvement is available in laszip4j 0.17.