gwlucastrig / gridfour

Tools for raster data including geophysical applications and digital elevation models
MIT License
22 stars 5 forks source link

Port the GVRS API to Rust #28

Open gwlucastrig opened 2 years ago

gwlucastrig commented 2 years ago

I am looking for a developer who would be interested in porting the GVRS API to the Rust programming language.

The GVRS API offers four capabilities that may be useful for the Rust community:

  1. It provides a “virtual raster” that would assist Rust programs processing very large gridded data products, especially those that might be too large to be conveniently kept in memory.
  2. GVRS provides a testbed for developers who are experimenting with data compression techniques for raster data products (that was, in fact, the original reason I wrote it).
  3. GVRS provides a persistent data store for raster products. It is particularly well suited to geophysical data.
  4. A GVRS port would provide Rust programs with access to the global elevation and bathymetry data sets that already exist for the Java API.

Now, before I go on, I have to qualify claims 3 and 4 by pointing out that there are many raster data products out there and virtually all of them have wider user bases than GVRS (NetCDF and HDF5, for example). In terms of sheer availability of data, those other products have distinct advantages over GVRS. So I don’t want to oversell my project.

On the other hand, I wrote GVRS with the idea that the code would be ported to other languages, and I tried to organize it in such a way that it a port could be executed quickly and well. If you want to port GVRS to Rust, my attitude is that it would be your project and I would try not to interfere in the design or direction of the porting effort. Of course, I am highly motivated to have someone succeed in porting GVRS to Rust. So I would be available to answer questions, explain concepts, and to help smooth over any code incompatibilities that might arise between the Rust and Java implementations.

To learn more about GVRS, visit our Project Wiki or read our Frequently Asked Questions page.

I recently posted a preliminary draft of the GVRS file format at https://gwlucastrig.github.io/GridfourDocs/notes/GvrsFileFormat_1_04.pdf

gwlucastrig commented 1 year ago

I have added a set of sample files in the Gridfour source distribution. These files are designed to exercise different features of the GVRS file format. They should provide good test cases for developers who are implementing a GVRS API in any language.

You may find them in the folder gridfour-master/core/src/test/resources/org/gridfour/gvrs/SampleFiles.

A README.txt file is included to provide descriptions of the various test files.


File                            Grid    Tiles   Description
--------------------------     -------  -----   --------------------------------------------------------
Sample00_ShortNoComp.gvrs       10x10    5x5    Short, no nulls, not compressed
Sample01_IntNoComp.gvrs         10x10    5x5    Integer, no nulls, not compressed
Sample02_FltNoComp.gvrs         10x10    5x5    Float, no nulls, not compressed
Sample03_ICFNoComp.gvrs         10x10    5x5    Integer-Coded Float, scale=1.0, no nulls, not compressed

Sample04_ShortComp.gvrs        100x100  50x50   Short, no nulls, compressed
Sample05_IntComp.gvrs          100x100  50x50   Integer, no nulls, compressed
Sample06_FltComp.gvrs          100x100  50x50   Float, no nulls, compressed
Sample07_ICFComp.gvrs          100x100  50x50   Integer-Coded Float, scale=1.0, no nulls, compressed

Sample08_MixedTypes.gvrs        10x10    5x5    Multi-element short and float

Sample09_ShortNoComp.gvrs       10x10    6x6    Short, has nulls, not compressed
Sample10_IntNoComp.gvrs         10x10    6x6    Integer, has nulls, not compressed
Sample11_FltNoComp.gvrs         10x10    6x6    Float, has nulls, not compressed
Sample12_ICFNoComp.gvrs         10x10    6x6    Integer-Coded Float, scale=1.0, has nulls, not compressed

Sample13_ModelCoord.gvrs        11x11   11x11   Float with model coordinates
Sample14_LSOP.gvrs             101x101 101x101  ICF with LSOP compression
anweiss commented 1 year ago

Hey @gwlucastrig 👋 ... figured I'd give this a shot ... https://github.com/anweiss/gridfour-gvrs-rs ... very much a WIP. Will use the sample files you provided for the integration tests once implemented.

gwlucastrig commented 1 year ago

Thanks for letting me know about your project. I took a look at your early code... Rust is much different than anything I'm used to seeing. It will be very interesting to see what you come up with.

I noticed a few places where you were performing masking following the examples in the Java code. Java doesn't have an unsigned byte type (my least favorite aspect of the language). So when ever I wanted to treat 8-bit values as positive integer values, I had to mask them with 0xff. Not sure if you'll have to do that in Rust, but I thought I'd let you know.

My bit-reading routines use a buffer consisting of long integers. I experimented a bit with other buffer types (such as a simple byte buffer), but the long integer was most efficient. On the other hand, that too might be a quirk of Java.

Also, I think it's a good idea to start with the small sample files. But later on, when you want something more ambitious, I've got a sample of the elevation/bathymetry data available for download at https://github.com/gwlucastrig/gridfour/releases/download/v1.0.4/ETOP_v1.0.4.gvrs

My gut feeling has always been that the most critical component in the Java API is the tile-cache.

If you have any questions, feel free to post them here. I'll do my best to answer them in a prompt and useful manner.

Gary

On Mon, Nov 28, 2022 at 2:13 PM anweiss @.***> wrote:

Hey @gwlucastrig https://github.com/gwlucastrig 👋 ... figured I'd give this a shot ... https://github.com/anweiss/gridfour-gvrs-rs ... very much a WIP. Will use the sample files you provided for the integration tests once implemented.

— Reply to this email directly, view it on GitHub https://github.com/gwlucastrig/gridfour/issues/28#issuecomment-1329615863, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEWJDYMIDLYOIM3RMMPDRLLWKT74DANCNFSM6AAAAAARPTBH2Q . You are receiving this because you were mentioned.Message ID: @.***>

gwlucastrig commented 1 year ago

Also, there's a lot of code related to collecting statistics and byte counts in the GVRS API. While that was (is) useful when experimenting with algorithms or developing the file format, its not necessarily a core requirement for a port. So don't feel obligated to do the extra coding unless it looks like something that would be useful for your own efforts.