rbeyer / hiproc

Python libraries and programs for processing HiRISE data.
Apache License 2.0
4 stars 3 forks source link

EDR_Stats KeyError on 'Average' for CAL_BUFFER #3

Closed AndrewAnnex closed 3 years ago

AndrewAnnex commented 3 years ago

Description

I found that for one image that hiproc would fail somewhere in the pipeline for an image ESP_065856_1890 on a KeyError for 'Average' but it wasn't clear with verbose logging which function/area the pipeline was breaking. After some trial and error running things I found that EDR_Stats was failing on a single ccd channel (in this case ESP_065856_1890_RED2_1) because of a missing json file.

Diving deeper in I found the error was specifically with the "CAL_BUFFER" return for this ccd, where there is no average value/other statistical data.

Below is the output for the bad ccd:

PVLGroup([
  ('TotalPixels', 372)
  ('ValidPixels', 0)
  ('NullPixels', 0)
  ('LisPixels', 372)
  ('LrsPixels', 0)
  ('HisPixels', 0)
  ('HrsPixels', 0)
])

and the next is the "CAL_BUFFER" for ESP_065856_1890_RED2_0:

PVLGroup([
  ('Average', 1075.9247311828)
  ('StandardDeviation', 22.093155656939)
  ('Variance', 488.10752688172)
  ('Minimum', 1035.0)
  ('Maximum', 1104.0)
  ('TotalPixels', 372)
  ('ValidPixels', 372)
  ('NullPixels', 0)
  ('LisPixels', 0)
  ('LrsPixels', 0)
  ('HisPixels', 0)
  ('HrsPixels', 0)
])

There was nothing in the stderr for the histat kalasiris function, so I am not sure what went wrong, maybe something internal to ISIS?

As a longer term suggestion, it would be helpful if verbose logging was more verbose in terms of saying which step of the pipeline it is in, which image is being operated on etc.

What I Did

note: requires my moody tool to download the edr files, simply pip install moody

moody hirise_edr ESP_065856_1890
EDR_Stats ./*.IMG -v -k
rbeyer commented 3 years ago

Fun.

So, -v is for casual chattiness. For debugging, use -vv. That will log which module you're in as you go, and will also show the stack traces. There are some hiproc operations which are run using Python's multiprocessing, and when N jobs are all being logged at the same time, things can, indeed, get messy. If that gets to be problematic, you can always set --max_workers 1 which should make it single-threaded and make the logs more "linear."

The source of the error is because I'm not handling the return from histat properly. In certain recent conditions (typically in the period after August 2020 when we changed some ADC settings on the instrument), the CAL_BUFFER pixels can legitimately all be the special value LIS. In this case, histat doesn't apparently return a key/value pair for "Average" (which is what stats used to do, but now does something different). So I need to handle the condition when there isn't an "Average" key in the PVL-text that comes back from histat.

Should be easy to fix.

AndrewAnnex commented 3 years ago

I think using .get(key, default) for each line of parse_histat would work if None's can be used for missing values