rafaqz / Rasters.jl

Raster manipulation for the Julia language
MIT License
212 stars 36 forks source link

Support for GRIB files #282

Closed tcarion closed 8 months ago

tcarion commented 2 years ago

Hi!

First, thank you a lot for this great package that is helping me a lot with reading netcdf files! I wanted to know if there was a plan to add support for interfacing GRIB encoded files? I saw some discussion about it here and here, but it was some time ago. I also know about CfGRIB.jl, which can interface GRIB files with DimensionalData.jl, but it seems quite unmaintained and it doesn't always work (at least for the files I'm playing with). So reading GRIB with Rasters.jl would actually be great. Is there any WIP on that matter? I'm not a specialist neither in Julia nor in GRIB and I don't really know where to start with, but I would be happy to help :relaxed:

rafaqz commented 2 years ago

I didnt even know there was a grib/DimensionalData package! That should be possible to include into Rasters.jl.

Raster sources are a little complicated, but you could look at the netcdf as a reference.

I think grib files should be a RasterStack. In CfGRIB it's a wrapped NamedTuple of DimArray, which is nearly the same thing.

The OnDiskArray object can be GribDiskArray <: AbstractDiskArray.

tcarion commented 2 years ago

Yes, I looked at sources of Rasters and it was quite intimidating to me :sweat_smile:. But I'll look at it when I have a bit of time! Since the CfGRIB package seems quite left behind (and it's not on the official registry), what would be your approach to incorporate it into Rasters.jl ?

rafaqz commented 2 years ago

Yes Rasters.jl sources are pretty abstract. They share a lot of code so they all work the same way and have the same syntax, but theyre a bit confusing to write the first time. I'll try to make it easier one day I guess!

As for incorporating CfGRIB.jl, we can either include it directly in Rasters.jl or make a new package. CfGRIB.jl seems dead. Ultimately a grib package that was an analogue of NCDatasets.jl/ArchGDAL.jl might be best, so we're not coupling it to Rasters.jl. Edit: but maybe that us GRIB.jl already, and we can just build a Rasters.jl source on top of it?

You would develop it alongside adding a new source to Rasters.jl anyway The first task is making OnDiskArray integrate with DiskArrays.jl.

If you make a new repo for it I can review PRs. Or just make a new grib.jl file in sources here with the relevent code from cfgrib.jl, and PR.

tcarion commented 2 years ago

Hi @rafaqz! I've implemented an essential integration of CfGRIB.jl into Rasters.jl. It won't work for all cases, and there are some improvements to make, but can I ask you if you think I am going in the right direction here? I basically adapted the netcdf sources to the DataSet type of CfGRIB.jl. You can find the code on the grib branch of my Rasters.jl fork. You will need Pkg to install this branch of CfGRIB. I can also PR to Rasters.jl if it is easier for you. Thank you for your help!

rafaqz commented 2 years ago

That looks great! Probably good to make a PR here so we can discuss specific parts of the code more easily.

Some comments: there are still a few NCDfile types in there. Also it's not clear if your Datasetis opening something in a C library that needs to be closed? that's why most things around opening files happen inside a closure here - e.g. the netcdf Dataset(f, filename) function runs a close or destroy method after running f.

tcarion commented 2 years ago

Thanks, I made the PR!

As far as I know, the CfGRIB.DataSet keeps some information about the indices used to seek the data on the file. Each time readblock! is called, the file is open, the data are fetched thanks to these indices, and the file is then closed.

I think it would be nice to implement an interface to GRIB.jl that looks more like NCDatasets, but I don't know how to proceed. I PRed the AbstractDiskArray implementation for now, but I doubt it will be merged anytime soon considering the status of the repo. So should I create a new package (knowing it will probably retake some code from CfGRIB.jl). What about licensing in that case? I'm kind of new to open source contributions, and I don't know how such things usually work :relaxed:

rafaqz commented 2 years ago

Its an Apache license so you will need to check MIT/apache compatability and read the Apache terms. You may need to stick with that license and attribute the original authors somehow. But I think its quite open.

For now you can just make a simple package that works for this use case. The way you're saying CfGRIB opens files is also how Rasters.jl works (reopening when needed), so youre right there is no need to close Dataset.

eliteuser26 commented 1 year ago

Hi. I've read comments from GRIB.jl module where there is a need to read Grib files directly. I'm working on a new module with a different approach in pure Julia so I can access it directly in Windows. It should work in other platforms as well. I know that Eccodes.jl module is not in a rush to give support for Windows anytime soon without using Cygwin. It is only in alpha mode and there is still work to be done on it. It is not published anywhere yet.

rafaqz commented 1 year ago

See #283

@tcarion where are we at the merging that PR?

eliteuser26 commented 1 year ago

By the way Cfgrib.jl uses Grib.jl which uses Eccodes.jl. Same issue as before where Windows isn't supported without using Cygwin. I'm able to access the Grib file as an Iostream in Julia. From there I can access a portion of the data with vectors or arrays. I will see how far I can go with this project.

rafaqz commented 1 year ago

Ok, Im 100% in support of pure julia backends here. Happy to review PRs for that if you need help.

It is easier in the long term to have a full julia stack we can understand, profile and debug right to the bottom. For Shapefile.jl and GeoJSON.jl we write 100% julia implementations, so e.g. the raserization methods here are optimised right accross the stack from load up becauae its all julia

Raster data can be more complicated in some cases, and usually we end up falling back to GDAL, netcdf, eccodes etc.

But native is better if you can do it.

tcarion commented 1 year ago

I also think it would be great to read GRIB files purely in Julia! Once ready, I would be happy to depend on this new package in GRIBDatasets.jl instead of GRIB.jl.

See #283

@tcarion where are we at the merging that PR?

We are currently working on CommonDataModel.jl, which would simplify the integration of NetCDF and GRIB files in Rasters (and maybe also TIFF files ?). See https://github.com/JuliaGeo/GRIBDatasets.jl/issues/7 and https://github.com/JuliaGeo/GRIBDatasets.jl/issues/8.

I see two options here; either we already merge #283 and create another PR when the implementation of CommonDataModel is ready, or we close it and wait for this new implementation. If you prefer the first option, I will rebase with main and I think it will be ready to be merged.

eliteuser26 commented 1 year ago

Hi. I've been reading on the Gribdatasets and liked what I saw with this project. Reading both Grib and Netcdf in Julia is great. I will support it as a lot of good code was developed for it.

  1. I would like to say from the start that I'm new to Julia but I'm picking up the language very quickly. I've have a lot of experience with Python so that is the reason why I see a parallel between Python and Julia.

  2. My pure Julia Grib package isn't anywhere ready for distribution just yet. I'm missing some big pieces that I need to understand to decode the data part. If you have suggestions or comments I'm open to ideas. I was able to decode all sections in a Grib message quite easily except for the data. That part is done. I'm redoing the code from the Grib.jl package so I can use it in Windows. I'm also recreating the Ecmwf tables in CSV format. Created a code for this too.

  3. I think you should go ahead with your original plan. I'm not sure how long it will take to finalize my package. I will still need to put in documentation and examples according to Julia instructions. Then I should be able to disseminate this package in GitHub.

Hope this answers some of your questions.

tcarion commented 1 year ago

I appreciate your enthusiasm about GRIBDatasets :-). Yes we will keep using GRIB.jl for now, but don't hesitate to notify me when your code is available. I can even try to help!

eliteuser26 commented 1 year ago

Thanks. I will need to figure out how to decode the data part. I will ask people to test it out when it is done.

I tried in the past to read a Grib file from Environment Canada to test out in the Grib decoder but it came back with an error in the equivalent Python code. Not sure why yet.

Also I was reading a great article about how to read an image in Julia. The article indicated that an image can be represented as an array of pixel values in Julia. That fact just open a light in my mind which I never thought of using. Always learning something new.