Closed oers closed 9 months ago
This is somewhat expected, because files bigger than 2 GB cannot be memory mapped in a single ByteBuffer. The implementation uses RandomAccessFileDataInput
when a file is bigger than 2 GB. This class does not implement buffered reads, hence the poor performance.
PR https://github.com/mreutegg/laszip4j/pull/104 adds support for memory mapping files bigger than 2 GB.
Can you give it a try and let me know if it works for you?
Is the snapshot available in maven repo? Or do I need to build this myself. Will test this gladly.
No, there is no snapshot build available. You will have to build it from the branch.
Maven build reports a test error (see pull request, I commented there). Performance seems better (still running some tests).
Performance seems comparable between bufferinput and the new one. DId some Tests and in most cases both take around the same time. Buffered Inputstream "seems" to be conistently faster at about 10 to 25 seconds.
laz() is via BufferedInputStream and lazWithConsstructor is via Constructtor
Buffered Inputstream "seems" to be conistently faster at about 10 to 25 seconds.
I don't have a big las file handy, but added a simple performance test to https://github.com/mreutegg/laszip4j/pull/104 with https://github.com/mreutegg/laszip4j/pull/104/commits/600195dde02fc4c2d0a80da6e48099e24df43fbf. On my machine (Linux with Java 11), I get the following numbers for a 366 MB las file.
Using InputStream:
DescriptiveStatistics:
n: 9
min: 833.0
max: 872.0
mean: 846.0
std dev: 12.072696467649637
median: 840.0
skewness: 1.418264262531904
kurtosis: 1.8458009190395268
Using memory mapped file (buffer size 100 MB):
DescriptiveStatistics:
n: 9
min: 606.0
max: 634.0
mean: 619.7777777777778
std dev: 8.467257197253693
median: 620.0
skewness: 0.022892190184662018
kurtosis: -0.19965745051562234
Memory mapping the las file is slightly faster.
The only difference seems to be that I used LAZ (compressed). But the change looks good :)
The improvement is available in laszip4j 0.17.
Today I noticed a heavy performance difference between parsing via public LASReader(File file) and public static Iterable getPoints(InputStream is)
My File is a 2.1GB Large laz file in version 1.3 and has 400 Million Points. file.length is : 2156189931
With public static Iterable getPoints(InputStream is) parsing takes about 6 Minutes. This creates a simple BufferedInputStream from my FileInputStrean.
With public LASReader(File) it takes much longer. 3 hours 21 Minutes
Perhaps the difference lies in ByteStreamInFile: