Error from grib_get_message() with certain files

anchiho commented 1 year ago

Hello @nawendt

I'm an R-tool developer working in BSC-CNS (Barcelona Supercomputing Center.) We're trying to use gribr to load the GRIB files but we found some problems.

We downloaded some files, system5 and era5 monthly data, from ECMWF MARS archive. For era5 ones, the gribr functions work well. But for system5 ones, we got errors from grib_get_message().

library(gribr)
g <- grib_open("/path/to/file/tas_20001101.grb")
gm <- grib_get_message(g, 1)

ECCODES ERROR : Wrong size for hourOfEndOfOverallTimeInterval it contains 1 values
Error in grib_get_message(g, 1) : gribr: unable to get long array
GRIB ERROR Passed array is too small

We tried to load the data with pygrib and it works, so we wonder if it is a problem in gribr. I put the file in the link in case it's needed: https://drive.google.com/file/d/1ANMuGn7Y9vXU_UKD4Zp-w8r0ZoxjFgol/view?usp=share_link

Kind regards,
An-Chi

nawendt commented 1 year ago

Thanks for the example file. I'll take a look when I get some more free time. Need to figure out why that key is being sized inappropriately.

anchiho commented 1 year ago

Hi @nawendt

Thanks for the reply. Some update: we found that the problem is actually in the GRIB file. We used other tools to get that attribute and got the same error. The difference is that gribr::grib_get_message takes all the attributes at once while it's not necessary in pygrib or xarray.

I'd like to ask if it is possible to only retrieve selected attributes while using grib_get_message or grib_select? I see that the main function is in C but I'm not familiar with it, so I'd like to ask your opinion first.

Thanks for your time,
An-Chi

nawendt commented 1 year ago

Appreciate the update. Having looked at it some, I do have some ideas about how to handle this gracefully. I'll do some testing and see how things turn out.

anchiho commented 1 year ago

Hi @nawendt

Thanks a lot for the improvement. We tested it with our case and it worked well. We have one more thing that we'd like to ask because I think I didn't express myself clearly in the previous comment.

All the keys are retrieved when getting the messages using NULL in codes_keys_iterator_new() However, it is quite heavy and not all the keys are needed always. Is it possible to add a feature for selecting the desired keys? I created a function like skip_keys() but to choose the keys I want. It works for my need, but as you can see below, it is hard-coded. Does it seem reasonable for you to add this feature? E.g., add one argument in grib_get_message() to pass a vector of desired keys. Please let me know if we can contribute something, thanks a lot!

# in gribr_internals.c
int choose_keys(const char* keyName) {
    if (!strcmp(keyName, "validityDate") ||
        !strcmp(keyName, "validityTime") ||
        ...                              ||     # all the keys we need
        !strcmp(keyName, "shortName"))
   {
      return 1;
    }
    return 0;
}

# in message_from_handle.c
## line 79
    if (skip_keys(keyName, keyType, err)) {
      continue;
    } else {
      if (choose_keys(keyName)) {
        totalKeys++;
      } else {
        continue;
      }
    }

## line 101
    if (skip_keys(keyName, keyType, err) || !choose_keys(keyName)) {
      continue;
    } else {

Best regards,
An-Chi

nawendt commented 1 year ago

OK. I know ecCodes has namespaces and filters that can be used when retrieving keys. I have toyed around with using those to limit the number of keys processed. I guess the question is whether simply using that interface will work or if you think having finer control of the keys accessed (as it looks like above) is more valuable? I can see situations where you could potentially have a large character vector of key names you want to process. Would you find that cumbersome or is that acceptable?

anchiho commented 1 year ago

Hi @nawendt

Filtering by namespace by ecCodes interface seems like a good feature to me, but it may not be the best option for our case since the necessary keys may spread in different namespaces. Actually, the keys I selected in the comment above are the needed ones in gribr (plus a few more for attributes preservation.) So maybe it is useful to have an internal necessary key list? And we can choose to load only the necessary ones plus the additionals we want to have. By any means, it is totally acceptable for me having a large vector to specify the desired keys.

Many thanks, An-Chi

nawendt / gribr

Error from grib_get_message() with certain files #21