openradar / xradar

A tool to work in weather radar data in xarray
https://docs.openradarscience.org/projects/xradar
MIT License
88 stars 17 forks source link

Error when opening IRIS/Sigmet file #51

Closed aladinor closed 1 year ago

aladinor commented 1 year ago

I am trying to read an IRIS file using xradar.io.datatree and xr.open_dataset but I am having a dimension error.

import pyart
import fsspec
import xarray as xr
import xradar as xd
import boto3
import botocore
from botocore.client import Config
from datetime import datetime

def create_query(date, radar_site):
    """
    Creates a query for listing IDEAM radar files stored in AWS bucket
    :param date: date to be queried. e.g datetime(2021, 10, 3, 12). Datetime python object
    :param radar_site: radar site e.g. Guaviare
    :return: string with a
    """
    prefix = f'l2_data/{date:%Y}/{date:%m}/{date:%d}/{radar_site}/{radar_site[:3].upper()}{date:%y%m%d}'
    return prefix

def main():
    str_bucket = 's3://s3-radaresideam/'
    s3 = boto3.resource('s3',
                        config=Config(signature_version=botocore.UNSIGNED, user_agent_extra='Resource'))
    bucket = s3.Bucket('s3-radaresideam')
    query = create_query(date=datetime(2021, 10, 3, 12), radar_site='Guaviare')
    radar_files = [f'{str_bucket}{i.key}' for i in bucket.objects.filter(Prefix=f"{query}")]
    for idx, i in enumerate(radar_files[2:3]):
        file = fsspec.open_local(f'simplecache::{i}', s3={'anon': True}, filecache={'cache_storage': '.'})
       # dtree = xd.io.open_iris_datatree(file)
        ds = xr.open_dataset(file, group=1, engine="iris")

if __name__ == "__main__":
    main()

Then it rises this error,

ValueError: conflicting sizes for dimension 'range': length 580 on 'range' and length 581 on {'azimuth': 'DBTH', 'range': 'DBTH'}
python-BaseException

I have dug a little bit and I found that when creating the range dimension in iris.py file, line 3821 , it should create a NumPy array with a size of 581 but it instead creates a 580 NumPy array. I tried to solve it as follows but not sure if this is the right way .

rng = (
            np.arange(
                range_first_bin,
                range_last_bin + 1, ## ###adding 1 here
                task["step_output_bins"],
                dtype="float32",
            )[: task["number_output_bins"]]
            / 1e2
        )

Cheers,

Alfonso

kmuehlbauer commented 1 year ago

Thanks @aladinor for bringing this to attention.

I'm thinking this might be due to some numeric instability in np.arange, see the warning there.

We might need to think about a more robust range generation function. I can help with the re-implementation if you are up to it.

aladinor commented 1 year ago

Hi @kmuehlbauer. Sure! It was just a quick look to find the error. Just let me see if I can find a better solution.

kmuehlbauer commented 1 year ago

Yes, sure, could you post the arguments of the call to np.arange? Just to get a feeling why this fails in your case and how we might have to change the algorithm.

aladinor commented 1 year ago

@kmuehlbauer, I just opened the same file using pyart, then what I noticed was that the bin size is 299.0 meters.

def main():
    str_bucket = 's3://s3-radaresideam/'
    s3 = boto3.resource('s3',
                        config=Config(signature_version=botocore.UNSIGNED, user_agent_extra='Resource'))
    bucket = s3.Bucket('s3-radaresideam')
    query = create_query(date=datetime(2021, 10, 3, 12), radar_site='Guaviare')
    radar_files = [f'{str_bucket}{i.key}' for i in bucket.objects.filter(Prefix=f"{query}")]

    # using context manager for reading radar data from guaviare radar
    of = pyart.io.prepare_for_read(radar_files[2], storage_options={'anon': True})
    with of as f:
        radar = pyart.io.read(f)
        print(radar.range['data'])

Turns out to be

[  1000.   1299.   1598.   1897.  ....  174420.]

Then when looking at self.root.ingest_header["task_configuration"]["task_range_info"]["step_output_bins"] , line 3815 in iris.py the bin size is 30000 (not sure of units but seems to be cm).

Do you think this could make any difference when creating np.arange?

kmuehlbauer commented 1 year ago

I'm wondering why Py-ART reports 299.0m. At least the data in the file says 300m (yes, units are cm). That now starts to get very interesting. Maybe a problem with metadata inconsistency in the file?

Would your code work on my machine, or do I need any credentials to download the file? The other extracted header items would be very interesting to know to get the bottom of this.

aladinor commented 1 year ago

@kmuehlbauer it might be. this code will run on your pc. The AWS bucket is an open dataset. You should just copy and paste. This is the complete version of my code

import pyart
import fsspec
import xarray as xr
import xradar as xd
import boto3
import botocore
from botocore.client import Config
from datetime import datetime

def create_query(date, radar_site):
    """
    Creates a query for listing IDEAM radar files stored in AWS bucket
    :param date: date to be queried. e.g datetime(2021, 10, 3, 12). Datetime python object
    :param radar_site: radar site e.g. Guaviare
    :return: string with a
    """
    prefix = f'l2_data/{date:%Y}/{date:%m}/{date:%d}/{radar_site}/{radar_site[:3].upper()}{date:%y%m%d}'
    return prefix

def main():
    str_bucket = 's3://s3-radaresideam/'
    s3 = boto3.resource('s3',
                        config=Config(signature_version=botocore.UNSIGNED, user_agent_extra='Resource'))
    bucket = s3.Bucket('s3-radaresideam')
    query = create_query(date=datetime(2021, 10, 3, 12), radar_site='Guaviare')
    radar_files = [f'{str_bucket}{i.key}' for i in bucket.objects.filter(Prefix=f"{query}")]

    # using context manager for reading radar data from guaviare radar
    of = pyart.io.prepare_for_read(radar_files[2], storage_options={'anon': True})
    with of as f:
        radar = pyart.io.read(f)
        print(radar.range['data'])
        f.close()

    for idx, i in enumerate(radar_files[2:3]):
        file = fsspec.open_local(f'simplecache::{i}', s3={'anon': True}, filecache={'cache_storage': '.'})
        dtree = xd.io.open_iris_datatree(file, reindex_angle=False)
        ds = xr.open_dataset(file, engine="iris")

if __name__ == "__main__":
    main()
kmuehlbauer commented 1 year ago

Great, I can have a look tomorrow.

kmuehlbauer commented 1 year ago

@aladinor

OK, here are my thoughts:

{'range_first_bin': 100000,
 'range_last_bin': 17500000,
 'number_input_bins': 581,
 'number_output_bins': 581,
 'step_input_bins': 30000,
 'step_output_bins': 30000,
 'variable_range_bin_spacing_flag': 0,
 'range_bin_averaging_flag': 0}

Multiplying number_output_bins with step_output_bins yields 174300. Adding range_first_bin yields 175300. But what do these values refer to, bin-start, bin-center or bin-end? In the original implementation in wradlib and here at xradar we are assuming bin-center. Unfortunately I do not remember enough and also couldn't get anything out of the IRIS programmers manual which clearly describes that. So if there is anyone out there who can shed some light, please do so.

Actually in the above example the range-array is created with one element less because the stop-value is not part of the array, citing from numpy:

arange(start, stop, step) Values are generated within the half-open interval [start, stop), with spacing between values given by step.

That means your initial suggestion would work for this use case. But we really should think about a more robust solution, like this using linspace:

rng = (
    np.linspace(
        range_first_bin,
        range_last_bin,
        task["number_output_bins"],
        dtype="float32",
    )
    / 1e2
)
kmuehlbauer commented 1 year ago

@aladinor This should also be reported at Py-ART issue tracker, as there a wrong gate_size is calculated with resulting erroneous range values.

aladinor commented 1 year ago

Hi @kmuehlbauer,

I tried to use np.linspace and it worked for this specific file. However, when reading another file I got the following error,

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
python-BaseException

This is because in model.py line 518 xradar check differences in range and the unique variable returns a list of values.

diff = np.diff(rng)
unique = np.unique(diff)

Thus, I think this approach is not a general solution. I think something is wrong with this file or there should be something else when reading the metadata from the raw file.

Let me know your thoughts.

Alfonso.

kmuehlbauer commented 1 year ago

@aladinor Might be some inconsistency how we test for correct metadata or you now stumbled over another issue. But to be sure, can you make that other file available too?

aladinor commented 1 year ago

@kmuehlbauer I think you can use any other file in radar_files varible. I just identified the files 2, 5, 8, ... within that list are the ones that present this issue.

    for idx, i in enumerate(radar_files[2:3]): ### ------------> here you can chose any other out of 864 files 
        file = fsspec.open_local(f'simplecache::{i}', s3={'anon': True}, filecache={'cache_storage': '.'})
        dtree = xd.io.open_iris_datatree(file, reindex_angle=False)
        ds = xr.open_dataset(file, engine="iris")
kmuehlbauer commented 1 year ago

@aladinor @mgrover1 While working on this I found more inconsistencies with the metadata (eg. with longitude). I'll issue a PR when I'm ready.

There are also some points which needs discussion in a wider audience (eg. angle reindexing). I'll open issues as needed.

kmuehlbauer commented 1 year ago

@aladinor Do you know, if there are other IRIS/Sigmet based radars in the southern hemisphere? The Guaviare is slightly northern. I'd need one on the southern hemisphere to know how exactly the angle is translated in the data.

aladinor commented 1 year ago

@kmuehlbauer I know that Argentina uses INVAP radars. Brasil also has radars maybe @Soudaaa could help us to find a couple of files (?). On the other hand, I know that during the Relampago field campaign they used a VAISALA radar in the southern hemisphere. Not sure if @mgrover1 has access to this dataset.

mgrover1 commented 1 year ago

@aladinor last time I checked, the Argentina radar data is not the EOL archive... do you know if there is a copy at UIUC?

kmuehlbauer commented 1 year ago

@aladinor Thanks for your suggestions. I found that Vaisala equipped some company (CEMIG) near Belo Horizonte with a radar. Also New Zealand has Vaisala radars. But they have no open data unfortunately.

I've contacted Vaisala with that question. They will know how it works, but proof by data file would be good.

Soudaaa commented 1 year ago

@aladinor I sent some sample data for you to work with. Managed to read the volumes with no issues using Py-ART.

kmuehlbauer commented 1 year ago

@aladinor I'd appreciate if you could test your code with #53. Thanks!

aladinor commented 1 year ago

@kmuehlbauer I tested it and it worked! Thanks for looking at it.

kmuehlbauer commented 1 year ago

Great! Your help is very much appreciated. It really helps to sort things out.