ssec / polar2grid

Tools for reading, remapping, and writing satellite instrument data.
http://www.ssec.wisc.edu/software/polar2grid/
GNU General Public License v3.0
72 stars 34 forks source link

Polar2Grid Version 3.0 issues #326

Closed kathys closed 1 year ago

kathys commented 3 years ago

Here is a checklist of the issues that I found with the Version 3.0 tarballs.

Tarball: polar2grid-swbundle-20210324-121314.tar.gz

Edit by Dave: I've added a section header with the name of the tarball. As we come out with new tarballs that may have new and different issues we can add new sections to keep track of what tarball causes or fixes what issues.

Tarball: polar2grid-swbundle-20210422-181315.tar.gz

noaa20_viirs_20210601_081815_montage

new_versus_old_P2G_first_image_line

Tarball: polar2grid-swbundle-20211108-202812.tar.gz

Tarball: polar2grid-swbundle-20211201-203552.tar.gz

Extra Requests

djhoese commented 3 years ago

Regarding the --list-products issue, this may be related to #319. Basically, I need to make P2G know that when a user says X it means Y in "satpy land" and when we need to do user-facing things we need to map Y->X so the user doesn't really see the difference.

djhoese commented 3 years ago

In #325 we discussed how --fill-value 0 should solve the transparent versus black background change in this new version. In my recent PRs (mostly #333 and #334) the --list-products stuff should be fixed to. After making sure some new tests are working I'll make a new tarball for you to test @kathys. Then we can add a new section to the description above and start a new list of issues. We'll need to double check that my changes to crefl and EWA resampling fix the issues seen in #325.

kathys commented 3 years ago

Got it @djhoese Thanks.

djhoese commented 3 years ago

I wrote this response on slack (made some small updates here) but thought it would be good to add here regarding Liam's requests for timings.

  1. Add date/time to log messages. Not a problem. However, I doubt this will be as useful as he thinks it will be, see below.
  2. Knowing how long a specific segment of the processing takes: This is no longer possible without intentionally hurting performance. Since we are using dask, all processing is delayed until the latest moment possible. In most cases this is at the very end when we are writing the data to disk so log messages with times for earlier on in P2G would make certain steps appear almost instantaneous. Put another way, Polar2Grid is building up the "instructions" for processing (fast) and then actually "runs" the instructions at the end (slow).

    Another complication is that dask is running per array chunk. So there might be certain steps (like low-level ll2cr/fornav function calls) where I could print out the execution time for that portion, but there would be an execution time for each chunk which isn't that useful.

    Now, for the "hurting performance" part I mentioned, I could have dask compute things before the end and print out how long each of those computations takes. For example, create "instructions" to load and calibrate the data, compute this data into memory and time how long it takes, then create "instructions" to remap the data using EWA, then compute the results into memory and time how long it takes. This would give you per-step timing but you would have to store those results in memory or on disk which takes time and space. Additionally, one of the benefits of doing things per chunk is that other computations can be done while we're waiting for a resource. Like if we are writing one chunk to disk, we can be resampling another chunk, computing each step stops that from happening.

djhoese commented 3 years ago

@kathys I just realized, for adding times to log messages, that has existed in Polar2Grid for a long long time in the log file. I don't think times should be added to log messages printed to the screen. If someone is really interested in timings they can look at the log file. Willing to discuss.

[2021-04-19 11:05:43,649] : PID 117273 : DEBUG    : viirs_sdr_geotiff : main : Starting script with arguments: /home/davidh/repos/git/polar2grid/polar2grid/glue.py -r viirs_sdr -w geotiff -vv -p true_co
lor false_color --grid-configs /data/bumi/data/test_data/grid_configs/grid_example.conf -g miami -f /data/bumi/data/test_data/viirs/input/test1 -vv
[2021-04-19 11:05:43,650] : PID 117273 : INFO     : viirs_sdr_geotiff : main : Sorting and reading input files...
[2021-04-19 11:05:43,650] : PID 117273 : DEBUG    : satpy.readers.yaml_reader : load_yaml_configs : Reading ('/home/davidh/repos/git/satpy/satpy/etc/readers/viirs_sdr.yaml',)
kathys commented 3 years ago

@djhoese The way Liam and I execute the software in our operational processing, we do not save the log files. If it is easy to do, and since Liam is the boss, I think it should be added to stderr output.

djhoese commented 3 years ago

While I understand that need and am OK adding it for this, you should really be preserving the log file. It has much more output and information than just the stderr output (including Satpy log messages and other third-party libraries).

kathys commented 3 years ago

Absolutely true. But we do not need to look at the log files very often, so if we do, we just run the problem overpass again and keep the log file. When there are problems, 99% of the time we can figure out what went wrong from the stdout/stderr messages.

kathys commented 2 years ago

Tarball: polar2grid-swbundle-20211219-125805

Documentation