pytroll / pygac

A python package to read and calibrate NOAA and Metop AVHRR GAC and LAC data
https://pygac.readthedocs.org/
GNU General Public License v3.0
20 stars 27 forks source link

export coefficients to json file #58

Closed carloshorn closed 4 years ago

carloshorn commented 4 years ago

In favour of giving more control over the calibration coefficients to the user, this PR will export the coefficients from calibration.py to an external data file.

Closes #62

carloshorn commented 4 years ago

Next step is the user interaction. I would add another section into the pygac config which gives the possibility to add the path to a user defined calibration.json file. I think it would be good if the user could give a reduced json file which is used to update the default coefficients, or do you think that this makes it harder to track the source? I would of cause log any changed coefficient.

sfinkens commented 4 years ago

Being able to just override a small subset of coefficients would be a nice feature! Users focused on traceability can still download an entire set of coefficients via DOI or so.

I'm ok with the config section. Can you please also make this an argument of Reader.__init__? Then it can be used in satpy like so: https://github.com/pytroll/satpy/blob/master/satpy/readers/avhrr_l1b_gaclac.py#L130

carloshorn commented 4 years ago

Good idea @sfinkens. Regarding the user friendliness, we should also talk about the coefficient names. These names should be more self-explanatory. Suggestions are welcome! I will try to come up with a list to start the iteration.

mraspaud commented 4 years ago

Regarding the coeffs names, I would also vote for having a documentation page explaining each coeff and the formulas they are applied in.

carloshorn commented 4 years ago

And again a feature missing in python 2.... In ancient times, the ConfigParser.get method did not know fallback values....

carloshorn commented 4 years ago

With the last commits, the user can provide different default coefficients via the config file in the section "calibration" as option "coeffs_file", e.g.

[calibration]
coeffs_file = /path/to/user/default/coeffs.json

If coeffs_file is not given, it will fallback to the pygac defaults in data/calibration.json. Furthermore, it is possible to overwrite the defaults by passing custom coefficients directly to the calibrate_solar, and calibrate_thermal functions.

This covers the three use cases:

  1. The user does not care about calibration details and wants to use whatever pygac recommends. This is the default behaviour and the user does not need to do anything.
  2. The user has computed his own coefficients and wants to use them instead of the pygac recommendation. In this case, the user can compile these coefficients in the format of data/calibration.json and provide it via the pygac configuration.
  3. The user only wants to change a small subset of coefficients, e.g. for a sensitivity study. In this case, the user can provide the subset of coefficients to change with a dictionary directly to the calibration functions (which is passed to the reader constructor as "calibration" keyword argument).

So far the implementation of this feature... Now, we should decide for a good naming convention of these parameters. How about using the naming convention from https://github.com/rsgb-neuhaus/viscal_db/blob/master/csv/header.patmos.csv? This would also avoid list type objects in the json file.

abhaydd commented 4 years ago

Great job. Thanks a lot @carloshorn for this adaptation.

sfinkens commented 4 years ago

Nice work @carloshorn ! The viscal_db names look good to me.

carloshorn commented 4 years ago

Having in mind that I wanted to avoid list types in the json to give every item a specific name, I came up with the following naming convention, which makes use of the hierarchical json structure.

"metopb": {
    "channel_1": {
        "dark_count": 39.7,
        "gain_high_s0": 0.166,
        "gain_high_s1": 2.019,
        "gain_high_s2": -0.201,
        "gain_low_s0": 0.055,
        "gain_low_s1": 2.019,
        "gain_low_s2": -0.201,
        "gain_switch": 501.12
    },
    "channel_2": {
        "dark_count": 40.0,
        "gain_high_s0": 0.183,
        "gain_high_s1": 1.476,
        "gain_high_s2": -0.137,
        "gain_low_s0": 0.061,
        "gain_low_s1": 1.476,
        "gain_low_s2": -0.137,
        "gain_switch": 500.82
    },
    "channel_3a": {
        "dark_count": 40.3,
        "gain_high_s0": 0.201,
        "gain_high_s1": 1.748,
        "gain_high_s2": -0.033,
        "gain_low_s0": 0.029,
        "gain_low_s1": 1.748,
        "gain_low_s2": -0.033,
        "gain_switch": 501.32
    },
    "channel_3b": {
        "centroid_wavenumber": 2664.3384,
        "radiance_correction_c0": 0.0,
        "radiance_correction_c1": 1.0,
        "radiance_correction_c2": 0.0,
        "space_radiance": 0.0,
        "to_eff_blackbody_intercept": 0.9970158319134996,
        "to_eff_blackbody_slope": 1.7711318
    },
    "channel_4": {
        "centroid_wavenumber": 933.71521,
        "radiance_correction_c0": 5.44,
        "radiance_correction_c1": 0.89848,
        "radiance_correction_c2": 0.00046964,
        "space_radiance": -4.98,
        "to_eff_blackbody_intercept": 0.9986240957209157,
        "to_eff_blackbody_slope": 0.51860807
    },
    "channel_5": {
        "centroid_wavenumber": 839.72764,
        "radiance_correction_c0": 3.84,
        "radiance_correction_c1": 0.93751,
        "radiance_correction_c2": 0.00025239,
        "space_radiance": -3.4,
        "to_eff_blackbody_intercept": 0.9988311677674785,
        "to_eff_blackbody_slope": 0.40059787
    },
    "date_of_launch": 2012.77,
    "thermometer_1": {
        "c0": 276.6194,
        "c1": 0.050919,
        "c2": 1.471e-06,
        "c3": 0.0,
        "c4": 0.0
    },
    "thermometer_2": {
        "c0": 276.6511,
        "c1": 0.050892,
        "c2": 1.489e-06,
        "c3": 0.0,
        "c4": 0.0
    },
    "thermometer_3": {
        "c0": 276.6597,
        "c1": 0.050845,
        "c2": 1.521e-06,
        "c3": 0.0,
        "c4": 0.0
    },
    "thermometer_4": {
        "c0": 276.3685,
        "c1": 0.050992,
        "c2": 1.482e-06,
        "c3": 0.0,
        "c4": 0.0
    }
}

Even if some coefficients have the same prefix, I think that two levels provide a good readability. The only thing that I would like to change is the launch date format. A fraction of year is not very intuitive. I would prefer a classical timestamp like YYYY-mm-dd. What do you think?

Edit: Maybe someone can come up with a better name for to_eff_blackbody, which should point out that this coefficients should be used for the transformation from thermometer temperature to effective black body temperature. as defined in (7.1.2.4-3) KLM Users Guide.

mraspaud commented 4 years ago

Looks good! for the thermometer coeffs, why not skip c3 and c4 as they don't seem to be relevant ?

carloshorn commented 4 years ago

I would prefer to keep c3 and c4, even if always set to zero, because it is equation (7.1.2.4-1) in the KLM guide which defines them. They are used in the code in the following lines https://github.com/pytroll/pygac/blob/2d7fbafbf03ae59d14b1fffdd33154358faee139/pygac/calibration.py#L495-L499 If we default them to zero in the code, I would not see the point of having a default value file.

mraspaud commented 4 years ago

Good point, I agree, thank you for explaining this :)

carloshorn commented 4 years ago

I will start the implementation of the new structure tomorrow. I would also like to put more comment lines in the calibration functions and maybe rename some inner variables, but it should neither chnage the results, nor should it change the interface (function argument names). Are you okay with adding it into this PR?

mraspaud commented 4 years ago

Go ahead.

carloshorn commented 4 years ago

Even comment lines can break python 2.7....

carloshorn commented 4 years ago

@helgaweb, thank you very much for the comments. Especially for giving the source to the averaging method that we use.

At all, I would like to propose a change in the coefficients that we are using. In my opinion, we do not need to distinguish between high and low gain slope coefficients like PATMOS-x does, because there is a very simple conversion between both as shown in the following lines:

While writing the documentation on the solar calibration, I went again through the paper of Heidinger et al 2010. In the appendix, they describe how to convert dual gain counts to single gain counts.

Based on casual communication with ITT engineers, the dual gain operation was accomplished for channels 1 and 2 by decreasing the calibration slope by 50% in the low end and increasing it by 150% in the end [...]

After a short google search, I found the following information in the book "Remote Sensing Time Series: Revealing Land Surface Dynamics"

Another change in design of the AVHRR/3 instrument was the introduction of a dual-gain feature for the reflective channels 1, 2 and 3A. In order to improve the radiometric resolution of the instrument for low reflectance targets, the dynamic range of the instrument was divided equally in two ranges, i.e. nominally from 0 to 500 counts and from 500 to 1,000 counts. For channels 1 and 2 half of the available Digital Number (DN) range is assigned to the low albedo range from 0 to 25% with the other half to the high albedo range from 26 to 100%. This allows for an increase in the radiometric resolution for dark targets. For channel 3A, the split between low and high albedo range is set at 12.5% albedo (Rao and Sullivan 2001)

Therefore, the equations (A1) and (A2) from Heidinger et al 2010 can be written as

where for channel 1, 2 and for channel 3A.

If we insert this into equation (1) in Heidinger et al 2010

we see, that the first dark count D (note the typo in the paper C_d=D) in equation (A1) vanishes and we get a similar equation

Note that S is the single gain slope. The dual-gain slopes are simply defined by

Looking at equation (6) in Heidinger et al 2010

we can see, that only is affected by the multiplication with to get high and low slope parameters. The other ones remain the same as you can verify in all PATMOS-x calibration files. To recover the single slope parameter from the dual gain parameters, we simply need to divide by . You can also assure yourself from the PATMPOS-x data that (+/- rounding error at last digit)

Example coefficients for NOAA15

    0.060   -0.069    0.002       ! ch1 low gain S0,S1,S2
    0.179   -0.069    0.002      ! ch1 high gain S0,S1,S2
    0.069    0.339   -0.010       ! ch2 low gain S0,S1,S2
    0.206    0.339   -0.010      ! ch2 high gain S0,S1,S2
    0.025    0.000    0.000       ! ch3a low gain S0,S1,S2
    0.175    0.000    0.000       ! ch3a low gain S0,S1,S2
carloshorn commented 4 years ago

I would propose to keep only single-gain slope parameters and a boolean flag called something like dual-gain. That would be sufficient to calibrate the visible channels.

carloshorn commented 4 years ago

Furthermore, I would like to revert to the original definition of b1 == radiance_correction_c1 (linear term in the non-linear correction for channel 4 and 5). This improve the traceability of the code, see #62.

mraspaud commented 4 years ago

@abhaydd any comments here?

carloshorn commented 4 years ago

Note, that the last commit may change results, because it corrects a typo in the NOAA-7 coefficients The correct value for channel 1 is 'bh' == 'bl' == 3.792 according to the PATMPS-x coefficents. https://github.com/pytroll/pygac/blob/2d7fbafbf03ae59d14b1fffdd33154358faee139/pygac/calibration.py#L126-L127

carloshorn commented 4 years ago

I would propose to keep only single-gain slope parameters and a boolean flag called something like dual-gain. That would be sufficient to calibrate the visible channels.

I noticed, that we do not need the extra boolean flag, because we are already setting "gain_switch": null in the json file in case of single-gain calibration.

carloshorn commented 4 years ago

Summary of the current status: For the visible calibration, we use dark_count, gain_switch, and single-gain slope coefficients s0, s1, s2 for each channel. In case of a single-gain instrument, the value of gain_switch is set to null. The detector ageing is calculated relative to the date_of_launch parameter, which is now set as a timestamp instead of a float in the unit of years anno Domini. Note, that the conversion from float to timestamp reveals that these numbers do not necessarily match the actual date of launch as known from other sources, e.g. Metop-B was launched on the 17th of September 2012, but the float given in the PATMOS-x file 2012.77 corresponds to the timestamp 2012-10-08 19:40:48. For the thermal calibration, we use centroid_wavenumber, radiance_correction_c0, radiance_correction_c1, radiance_correction_c2, space_radiance, to_eff_blackbody_intercept, to_eff_blackbody_slope for each channel. Note, that the radiance_correction_c1 is defined like the coefficients for the non-linear correction, that means not added by one like in the previous PyGAC coefficents. Furthermore, we use five coefficients c0, c1, c2, c3, c4 for each platinum resistance thermometer. Hidden (not accounted for in the json file) parameters in our implementation are the prt_threshold which is used to mask erroneous values for interpolation (set to 50), and two other thresholds for the in-orbit calibration target (ICT) counts and space counts (set to 100), and the window length (set to 51) used for the smoothing. On top there is also the valid range for brightness temperature (set from 170 to 350). If we do not think that anyone is interested in playing with these hidden parameters, I am fine with fixing them inside the code, but I wanted to mention them for awareness and completeness. Please have a look at the following example parameters:

"metopb": {
    "channel_1": {
        "dark_count": 39.7,
        "gain_switch": 501.12,
        "s0": 0.11066666666666668,
        "s1": 2.019,
        "s2": -0.201
    },
    "channel_2": {
        "dark_count": 40.0,
        "gain_switch": 500.82,
        "s0": 0.122,
        "s1": 1.476,
        "s2": -0.137
    },
    "channel_3a": {
        "dark_count": 40.3,
        "gain_switch": 501.32,
        "s0": 0.11485714285714287,
        "s1": 1.748,
        "s2": -0.033
    },
    "channel_3b": {
        "centroid_wavenumber": 2664.3384,
        "radiance_correction_c0": 0.0,
        "radiance_correction_c1": 0.0,
        "radiance_correction_c2": 0.0,
        "space_radiance": 0.0,
        "to_eff_blackbody_intercept": 1.7711318,
        "to_eff_blackbody_slope": 0.9970158319134996
    },
    "channel_4": {
        "centroid_wavenumber": 933.71521,
        "radiance_correction_c0": 5.44,
        "radiance_correction_c1": -0.10152000000000005,
        "radiance_correction_c2": 0.00046964,
        "space_radiance": -4.98,
        "to_eff_blackbody_intercept": 0.51860807,
        "to_eff_blackbody_slope": 0.9986240957209157
    },
    "channel_5": {
        "centroid_wavenumber": 839.72764,
        "radiance_correction_c0": 3.84,
        "radiance_correction_c1": -0.062490000000000046,
        "radiance_correction_c2": 0.00025239,
        "space_radiance": -3.4,
        "to_eff_blackbody_intercept": 0.40059787,
        "to_eff_blackbody_slope": 0.9988311677674785
    },
    "date_of_launch": "2012-10-08 19:40:48",
    "thermometer_1": {
        "c0": 276.6194,
        "c1": 0.050919,
        "c2": 1.471e-06,
        "c3": 0.0,
        "c4": 0.0
    },
    "thermometer_2": {
        "c0": 276.6511,
        "c1": 0.050892,
        "c2": 1.489e-06,
        "c3": 0.0,
        "c4": 0.0
    },
    "thermometer_3": {
        "c0": 276.6597,
        "c1": 0.050845,
        "c2": 1.521e-06,
        "c3": 0.0,
        "c4": 0.0
    },
    "thermometer_4": {
        "c0": 276.3685,
        "c1": 0.050992,
        "c2": 1.482e-06,
        "c3": 0.0,
        "c4": 0.0
    }
}

There is still the chance to change some names. I usually use c for any kind of coefficient, but we could also use the variable name used in the corresponding equation, e.g. the thermometer parameters have the name d0, ..., d4 in equation (7.1.2.4-1) KLM Guide. Any suggestions?

@helgaweb also asked to add a version number of these coefficients inside the file. I guess you mean something like "version": "v2017r1" for the current set of parameters? I do not know, if this is beneficial. Imagine that for example at some point we use thermal calibration parameters from another source, but keep the solar calibration as it is, which version tag should we use then? Maybe an entry at top level called description with a plain text containing explanations and citations could do the job. Any suggestions?

mraspaud commented 4 years ago

@carloshorn great work so far! I haven't looked at the implementation yet, but from what you describe it looks great. A couple of things:

helgaweb commented 4 years ago

Indeed. It was just an idea, to make this note in the json file to improve the traceability. I like the idea about the top-level description with the source of the default coeffs, however, there is also the pygac documentation Maybe I missed it, but a note in the calibration.py that the the current default time-dependent PATMOS-x coefficients (version: y2017rf; PATMOS-x User guide) are stored in the json.txt file, would be good.

sfinkens commented 4 years ago

Awesome @carloshorn! I'd also stick to the coeff names from the equations. And I think a traceable versioning would be very beneficial. So users can say: "I used pygac-1.2.3 with coefs v2017". What if we embed a list of "official" coefs in the code:

[{'md5sum': ..., 'version': 'v2017', 'DOI': 'DOI:1234/5678'},
 {'md5sum': ..., 'version': 'v2015', 'DOI': None}]

At runtime we could check if the current file matches one of these versions and update the metadata accordingly. Otherwise, print a warning.

carloshorn commented 4 years ago

Many thanks for the input: @mraspaud: I changed the date format to an iso format timestamp, e.g. "2006-10-19T19:37:12.000000Z", changed fit coefficient names to the variable names in the corresponding equations, i.e. coefficients having an integer index like the thermometer coefficients "d0", ..., "d4", and introduced a md5 check of the json file. Coefficients with a physical meaning keep a hopefully meaningful name, the only ones I still would like to change are the to_effective_blackbody_..., unfortunately, their names in the equations are A and B, which are often used in the reverse transformation as well. Furthermore, I liked slope and intercept because it is more instructive for a linear equation. Any suggestions?

@helgaweb: I added a description to the json file. Could you have a quick look at it. The idea is to keep the description of what PyGAC expects inside the json file, because this file can be used as template for a custom coefficient file written by a user or for a new releases. The version is not included, because it would not be static and someone changing the file might not update it. The version info comes in a similar way like @sfinkens described.

mraspaud commented 4 years ago

I'm good with your proposal @carloshorn

carloshorn commented 4 years ago

Cleanup and finalisation plan: if we agree on the coefficients, I would remove the old2new and new2old functions and do the correct mapping inside the Calibrator class. I used these functions to create the json file from the original coeffs dictionary. I also used them to make sure that the results never changed by reproducing the original coefficients with the reverse call. My question here: Should I keep all the rounding calls used in new2old to exactly reproduce the original results? Note, that the results might have slightly changed for NOAA-7 in any case due to the corrected typo. I introduced a deprecation warning in the solar calibration when using the correction factor argument corr in order to keep the units of the output clear. I could make sure that the transformation from scaled radiance to isotropic radiance is done outside in the reader class. The other possibility would be a change of interface by adding the sun-earth distance correction factor and solar-zenith-angle to the argument list and directly output isotropic radiance. Any preference?

helgaweb commented 4 years ago

@carloshorn Looks and sounds very good! Thanks! Any specific reason why you did not include the KLM User's guide in the method reference for the thermal calibration in the json file?

carloshorn commented 4 years ago

Any specific reason why you did not include the KLM User's guide in the method reference for the thermal calibration in the json file?

No... I simply forgot it...

helgaweb commented 4 years ago

Nice! I would prefer the optional correction for Sun-earth-distance in the reader class. Makes the code cleaner.

sfinkens commented 4 years ago

I think it's good if the coefs are identical, so I'd say keep the rounding. I don't have an opinion about the scaled/isotropic radiance thing :)

mraspaud commented 4 years ago

I agree with keeping the rounding. Is the scaled radiance being output at the moment to hdf files?

helgaweb commented 4 years ago

Yes.

def calibrate_solar(counts, chan, year, jday, spacecraft, corr=1)

In pygac 1.2 (CLARA-A2 L1C data) the scaled radiance output was corrected for changing Sun-Earth-distance, but not anymore in the updated pygac Version for the test CLARA-A3 L1C data.

carloshorn commented 4 years ago

Last thing to do are tests:

  1. Test if a changed json file leads to the desired warning (md5 hash test) and produces the expected results.
  2. Test if custom coefficients work.
  3. Test if the solar calibration throws the deprecation warning if the argument corr is used.

Anything I forgot?

carloshorn commented 4 years ago

I implemented the tests into test_calibrate_klm.py even if they are not klm specific. Are you okay with it, or should I move them into a separate calibration coefficient specific test case, something like test_calibration_coefficients.py?

carloshorn commented 4 years ago

@sfinkens, you once mentioned that in order to use coefficients of older PyGAC versions, you cherry-picked them from previous commits. If wanted, you could tell me which commits you used, and which version tag I should give them, so I can produce json files for them and add them together with their md5 hash. I could also provide a tool to convert PATMOS-x coefficients into PyGAC coefficients so we can build internal version tags for all of them, but I guess this is beyond the scope of this PR.

sfinkens commented 4 years ago

@carloshorn That's very kind of you, but there is no need to do this for our past activities. That's frozen anyway. But for the future it will be very helpful!

Haven't looked at the tests in detail, buy your proposal sounds good. And a separate test case is a good choice in my opinion!

mraspaud commented 4 years ago

Great work @carloshorn ! I agree with @sfinkens for both points.

carloshorn commented 4 years ago

Alright, I hope that's it. In summary: With this PR, the calibration coefficients will no longer be a static part of calibration.py. The default values are stored in the external file data/calibration.json, which can be used as template for user defined defaults. The user can add his/her defaults through the PyGAC configuration file, e.g.

[calibration]
coeffs_file = /path/to/user/default/coeffs.json

If the user passes an official PyGAC defaults file (currently there is only one), the code will recognise it. At runtime, it is possible to check the calibration file version using the class method Calibrator.get_version().

Furthermore, it is possible to change individual coefficients by passing the satellite specific changes to the reader class using the keyword calibration in the constructor, e.g.

my_calibration= {"channel_1": {"dark_count": 43.21}}
filepath = "/path/to/file"
Reader = get_reader_class(filepath)
reader = Reader(calibration=my_calibration)
reader.read(filepath)

In addition, the functions of calibration.py will have more comment lines explaining each step and giving references to the applied methods, which should be beneficial for the readability and future development.