Closed aladinor closed 1 year ago
Thanks @aladinor for bringing this to attention.
I'm thinking this might be due to some numeric instability in np.arange, see the warning there.
We might need to think about a more robust range generation function. I can help with the re-implementation if you are up to it.
Hi @kmuehlbauer. Sure! It was just a quick look to find the error. Just let me see if I can find a better solution.
Yes, sure, could you post the arguments of the call to np.arange? Just to get a feeling why this fails in your case and how we might have to change the algorithm.
@kmuehlbauer, I just opened the same file using pyart
, then what I noticed was that the bin size is 299.0
meters.
def main():
str_bucket = 's3://s3-radaresideam/'
s3 = boto3.resource('s3',
config=Config(signature_version=botocore.UNSIGNED, user_agent_extra='Resource'))
bucket = s3.Bucket('s3-radaresideam')
query = create_query(date=datetime(2021, 10, 3, 12), radar_site='Guaviare')
radar_files = [f'{str_bucket}{i.key}' for i in bucket.objects.filter(Prefix=f"{query}")]
# using context manager for reading radar data from guaviare radar
of = pyart.io.prepare_for_read(radar_files[2], storage_options={'anon': True})
with of as f:
radar = pyart.io.read(f)
print(radar.range['data'])
Turns out to be
[ 1000. 1299. 1598. 1897. .... 174420.]
Then when looking at self.root.ingest_header["task_configuration"]["task_range_info"]["step_output_bins"]
, line 3815
in iris.py
the bin size is 30000 (not sure of units but seems to be cm).
Do you think this could make any difference when creating np.arange
?
I'm wondering why Py-ART reports 299.0m. At least the data in the file says 300m (yes, units are cm). That now starts to get very interesting. Maybe a problem with metadata inconsistency in the file?
Would your code work on my machine, or do I need any credentials to download the file? The other extracted header items would be very interesting to know to get the bottom of this.
@kmuehlbauer it might be. this code will run on your pc. The AWS bucket is an open dataset. You should just copy and paste. This is the complete version of my code
import pyart
import fsspec
import xarray as xr
import xradar as xd
import boto3
import botocore
from botocore.client import Config
from datetime import datetime
def create_query(date, radar_site):
"""
Creates a query for listing IDEAM radar files stored in AWS bucket
:param date: date to be queried. e.g datetime(2021, 10, 3, 12). Datetime python object
:param radar_site: radar site e.g. Guaviare
:return: string with a
"""
prefix = f'l2_data/{date:%Y}/{date:%m}/{date:%d}/{radar_site}/{radar_site[:3].upper()}{date:%y%m%d}'
return prefix
def main():
str_bucket = 's3://s3-radaresideam/'
s3 = boto3.resource('s3',
config=Config(signature_version=botocore.UNSIGNED, user_agent_extra='Resource'))
bucket = s3.Bucket('s3-radaresideam')
query = create_query(date=datetime(2021, 10, 3, 12), radar_site='Guaviare')
radar_files = [f'{str_bucket}{i.key}' for i in bucket.objects.filter(Prefix=f"{query}")]
# using context manager for reading radar data from guaviare radar
of = pyart.io.prepare_for_read(radar_files[2], storage_options={'anon': True})
with of as f:
radar = pyart.io.read(f)
print(radar.range['data'])
f.close()
for idx, i in enumerate(radar_files[2:3]):
file = fsspec.open_local(f'simplecache::{i}', s3={'anon': True}, filecache={'cache_storage': '.'})
dtree = xd.io.open_iris_datatree(file, reindex_angle=False)
ds = xr.open_dataset(file, engine="iris")
if __name__ == "__main__":
main()
Great, I can have a look tomorrow.
@aladinor
OK, here are my thoughts:
{'range_first_bin': 100000,
'range_last_bin': 17500000,
'number_input_bins': 581,
'number_output_bins': 581,
'step_input_bins': 30000,
'step_output_bins': 30000,
'variable_range_bin_spacing_flag': 0,
'range_bin_averaging_flag': 0}
Multiplying number_output_bins
with step_output_bins
yields 174300. Adding range_first_bin
yields 175300. But what do these values refer to, bin-start, bin-center or bin-end? In the original implementation in wradlib and here at xradar we are assuming bin-center. Unfortunately I do not remember enough and also couldn't get anything out of the IRIS programmers manual which clearly describes that. So if there is anyone out there who can shed some light, please do so.
Actually in the above example the range-array is created with one element less because the stop
-value is not part of the array, citing from numpy:
arange(start, stop, step) Values are generated within the half-open interval [start, stop), with spacing between values given by step.
That means your initial suggestion would work for this use case. But we really should think about a more robust solution, like this using linspace:
rng = (
np.linspace(
range_first_bin,
range_last_bin,
task["number_output_bins"],
dtype="float32",
)
/ 1e2
)
@aladinor This should also be reported at Py-ART issue tracker, as there a wrong gate_size is calculated with resulting erroneous range values.
Hi @kmuehlbauer,
I tried to use np.linspace
and it worked for this specific file. However, when reading another file I got the following error,
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
python-BaseException
This is because in model.py
line 518
xradar check differences in range and the unique
variable returns a list of values.
diff = np.diff(rng)
unique = np.unique(diff)
Thus, I think this approach is not a general solution. I think something is wrong with this file or there should be something else when reading the metadata from the raw file.
Let me know your thoughts.
Alfonso.
@aladinor Might be some inconsistency how we test for correct metadata or you now stumbled over another issue. But to be sure, can you make that other file available too?
@kmuehlbauer I think you can use any other file in radar_files
varible. I just identified the files 2, 5, 8, ...
within that list are the ones that present this issue.
for idx, i in enumerate(radar_files[2:3]): ### ------------> here you can chose any other out of 864 files
file = fsspec.open_local(f'simplecache::{i}', s3={'anon': True}, filecache={'cache_storage': '.'})
dtree = xd.io.open_iris_datatree(file, reindex_angle=False)
ds = xr.open_dataset(file, engine="iris")
@aladinor @mgrover1 While working on this I found more inconsistencies with the metadata (eg. with longitude). I'll issue a PR when I'm ready.
There are also some points which needs discussion in a wider audience (eg. angle reindexing). I'll open issues as needed.
@aladinor Do you know, if there are other IRIS/Sigmet based radars in the southern hemisphere? The Guaviare is slightly northern. I'd need one on the southern hemisphere to know how exactly the angle is translated in the data.
@kmuehlbauer I know that Argentina uses INVAP radars. Brasil also has radars maybe @Soudaaa could help us to find a couple of files (?). On the other hand, I know that during the Relampago field campaign they used a VAISALA radar in the southern hemisphere. Not sure if @mgrover1 has access to this dataset.
@aladinor last time I checked, the Argentina radar data is not the EOL archive... do you know if there is a copy at UIUC?
@aladinor Thanks for your suggestions. I found that Vaisala equipped some company (CEMIG) near Belo Horizonte with a radar. Also New Zealand has Vaisala radars. But they have no open data unfortunately.
I've contacted Vaisala with that question. They will know how it works, but proof by data file would be good.
@aladinor I sent some sample data for you to work with. Managed to read the volumes with no issues using Py-ART.
@aladinor I'd appreciate if you could test your code with #53. Thanks!
@kmuehlbauer I tested it and it worked! Thanks for looking at it.
Great! Your help is very much appreciated. It really helps to sort things out.
I am trying to read an IRIS file using
xradar.io.datatree
andxr.open_dataset
but I am having a dimension error.Then it rises this error,
I have dug a little bit and I found that when creating the range dimension in
iris.py
file, line3821
, it should create a NumPy array with a size of 581 but it instead creates a 580 NumPy array. I tried to solve it as follows but not sure if this is the right way .Cheers,
Alfonso