jdkloe / pybufr-ecmwf

a python module that allows reading and writing BUFR formatted files, where BUFR stands for Binary Universal Form for the Representation of meteorological data.
Other
29 stars 12 forks source link

problem with code generating bad files #15

Closed donmurray closed 6 years ago

donmurray commented 6 years ago

Hi-

We have been using your library to generate wind profiler BUFR products for a couple of years on one of our production Linux boxes (CentOS) and this has been working well. Thank you for a great product.

The original developer of this code is no longer available, and I'm trying to do some new development and have installed the library on a new machine (also CentOS). However, when I run the same script on the new machine the values for every level after the first are bad (very large values). It's like there is a memory or pointer issue when the data are either packed into cvalues. I have tried the pip install of version 0.82, the anaconda install and have built/installed it from scratch - all with the same result. Have you run into this issue before?

I'm attaching the script, the raw data used to create the BURF file, the "bad" bufr file and a good version produced on the production machine. Unzip them into a directory, cd there and you can run the script as:

./EncodeWindProfilerBUFR.py oth 74994 oth18067.16w . 520461292

I would appreciate any insight you can provide.

Thanks again for a great product.

badbufr.zip

Don Murray

jdkloe commented 6 years ago

Thanks for your report, and for providing your test case. A quick test confirms that I get the same problematic output on my system as the bad file you provided. I will have to dig deeper into the details of your script to understand what may be the issue. I hope to be able to spend some time on this later this week (maybe thursday). Best regards, Jos

donmurray commented 6 years ago

Thanks for looking into this when you get a chance. If you need more information, let me know.

donmurray commented 6 years ago

Hi Jos-

I narrowed it down to a problem with the RADAR_BACK_SCATTER variable. I'm not sure if it's packing it incorrectly or what is happening. It's defined in the table (B0000000000059003001.TXT) as:

021192 RADAR BACK SCATTER dB 2 -5000 13

On the operational machine where the good files are being created, it's defined as:

021192 RADAR BACK SCATTER dB 0 0 7

The version is listed as 0.82dev on the operational machine.

Don

jdkloe commented 6 years ago

Hi Don, the situation is somewhat confusing to me. I have been making some changes to your example script, just to try understand what is happening. I made some simplifications here and there, especially in the ascii file parsing, so you may want to browse trough it and see if you can use it (see attachment): EncodeWindProfilerBUFR.py.modified.gz Also, there is no need to mention that I have copyright for this script, since it is your script.

I see two possible problems in your script: 1) you divide the SnrDB value by 100 before encoding it. This seems wrong to me, if the SnrDB value itself is already given in DB and the BUFR template uses dB as unit, this seems not needed (but maybe I misunderstood the input file?) 2) you use the value NaN to indicate missing values to the bufr software. This is wrong. The ECMWF BUFR software uses the special value of 1.7e38 to indicate missing. (I agree this is not clear from the documentation of the python module, so I'll add this to the next version).

I am not sure if these 2 points are enough to explain the strange behaviour you see, since I dont see huge values when I unpack the generated bufr file.

The good file you sent gives values like: 30 RADAR BACK SCATTER scatter = -49.68 40 RADAR BACK SCATTER scatter = -29.52 50 RADAR BACK SCATTER scatter = -50.0 ... 500 RADAR BACK SCATTER scatter = -44.88 510 RADAR BACK SCATTER scatter = -49.6 520 RADAR BACK SCATTER scatter = -50.0

the bad file you provided give values like:

30 RADAR BACK SCATTER scatter = 0.31 40 RADAR BACK SCATTER scatter = 0.32 50 RADAR BACK SCATTER scatter = 0.33 60 RADAR BACK SCATTER scatter = 0.32 ... 260 RADAR BACK SCATTER scatter = -0.05 270 RADAR BACK SCATTER scatter = -0.11 280 RADAR BACK SCATTER scatter = -0.16 290 RADAR BACK SCATTER scatter = 0.0 300 RADAR BACK SCATTER scatter = 0.0 ... 510 RADAR BACK SCATTER scatter = 0.0 520 RADAR BACK SCATTER scatter = 0.0

but looking at the ascii input file that you provided this seems correct, except for the zeroes at the end. In my modified copy in the script I used 1.7e38 in stead of NaN to indicate missing, and then the zeroes at the end change in:

510 RADAR BACK SCATTER scatter = 1.7e+38 520 RADAR BACK SCATTER scatter = 1.7e+38 which is as it should be I think, since the input ascii file also has some missing fields here for speed and dir.

Finally, you mention that the bufr table file on the operational machine contains this line: 021192 RADAR BACK SCATTER dB 0 0 7 but this must be a mistake. The 3 numbers at the end are scale, reference and width, and define what values can be stored. With these numbers, and using the following python code:

scale = 0
ref = 0
width = 7
step = 10.**(-1.*scale)
min_allowed_value = ref * step
max_allowed_value = ((2**width)-1+ref)*step
print(step, min_allowed_value, max_allowed_value)

I get as result: 1.0 0.0 127.0 so only values between 0 and 127 can be stored. Looking at values around -50 in your good file, this cannot be correct.

Using the other definition: 021192 RADAR BACK SCATTER dB 2 -5000 13

I get as result: 0.01 -50.0 31.91 which looks more consistent with your good file.

Also, looking at the available bufr table files in the ECMWF bufrdc library, there are 54 distinct B table files in the software, and they all give the latter definition allowing values between -50 upto 31.91 for radar backscatter.

So, if the large problematic values still remain, could you send me an ascii input file that generates these on your side?

donmurray commented 6 years ago

Hi Jos-

Thanks for taking the time to look at (and clean up) the code. I think I've figured out the problem. The program that is decoding the file (AWIPS) has a different definition for 021192 than the ECMWF bufr tables. So, I think the problem is that the data are getting packed one way and unpacked a different way. I think the previous developer changed the table on the production machine to be:

021192 RADAR BACK SCATTER dB 0 0 7

which is what AWIPS has. So, I can either create a custom table, or maybe I'll switch to defining the values as SIGNAL TO NOISE RATIO (021030) since that's the actual value we are using and it is defined the same in both tables.

I had the same question as you about why SNR was being divided by 10 and that's probably part of the issue. Thanks also for pointing out that the missing value is 1.7e+38 for the ECMWF library. I think the copyright was just left over from copying from your example.

Sorry to bother you with this issue, but thanks again for looking into it and clarifying some things for me. As with many BUFR issues, it's all about the tables being used.

jdkloe commented 6 years ago

Good to know the issue is solved on your side now. Yes, using the correct bufr tables is often the key, and making custom copies makes things very complicated. By the way, that is why the pybufr-ecmwf module allows you to place custom bufr tables outside the default tables directory, and give them a custom name (they dont need to follow the ECMWF naming scheme if you specify them explicitely). That way it is easier to make the distinction.

If you have any other questions feel free to contact me again.