Closed gwlucastrig closed 2 years ago
The initial code for this issue is now pushed up to the version 1.0.3-SNAPSHOT of the Gridfour code.
For an example of how to enable multi-threading, please see the GvrsReadPerformance.java demonstration code.
Again, we note that multiple threading is useful only when working with compressed data.
I have pushed out JUnit tests to verify the following:
Please see MultiThreadReadTest.java for more details.
The enhancements for multi-threaded reading are now compete. JUnit tests are implemented. And our wiki includes updated content describing this feature at GVRS Using Multiple Threads to Speed Processing .
The multi-threaded read implementation is available as part of the 1.0.3-SNAPSHOT version of the software now available on Github. The full 1.0.3 release is planned for late summer, 2022.
This issue is now closed.
This issue proposes to use a multi-threaded approach to improve the speed of reading data. It applies to files that are stored with data compression.
When GVRS reads data from a file that uses data compression, there are two cost factors:
It turns out that decompression is a significant contributor to access times. For example, reading the entire set of raw data from the uncompressed version of the ETOPO1 global elevation and depth data set (233 million points) requires 0.277 seconds just for file access. The compressed version requires 3.34 seconds for combined file access and decompression.
The Gridfour team is currently investigating an approach to reading data from a file using multiple-threads to perform the decompression operation.
Recall that a GVRS file is organized in tiles. If an application accesses tiles in a random order, there’s not much that additional threads can do to expedite data access. But if the application accesses tiles in a predictable order, the GVRS library can predict the next tile that the application will require and read and decompress it ahead of time using a supporting thread. In our initial experiments, access time for compressed ETOPO1 was reduced from 3.34 seconds to 1.88 seconds.
The GVRS API also includes an enhanced data compression technique known as LSOP that improves compression ratios but requires more processing time that the standard technique. In our experiments with the LSOP version of ETOPO1, reading time was reduced from 8.22 seconds to 4.36 seconds.
We also tested with the much larger GEBCO 2020 data set (3.7 billion points). Time to read the entire data set was reduced from 66.4 seconds to 37.2 seconds.
Remaining tasks for this issue include the creation of Junit tests, code inspections, and documentation.