Closed kathys closed 2 years ago
Could you give me the exact commands run and individual timings for each command if you have them (if not that's fine).
Version 3.0:
/data/users/kathys/polar2grid-swbundle-20211219-125805/bin/polar2grid.sh -r modis_l1b -w geotiff --num-workers 4 --fill-value 0 -f /data/users/kathys/version_3_0_timing/input_modis_china INFO : Sorting and reading input files... INFO : Loading product metadata from files... INFO : Using default product list: ['vis01', 'vis02', 'vis03', 'vis04', 'vis05', 'vis06', 'vis07', 'vis26', 'bt20', 'bt21', 'bt22', 'bt23', 'bt24', 'bt25', 'bt27', 'bt28', 'bt29', 'bt30', 'bt31', 'bt32', 'bt33', 'bt34', 'bt35', 'bt36', 'true_color', 'false_color', 'fog'] INFO : Running day coverage filtering... INFO : Running night coverage filtering... INFO : Computing dynamic grid parameters... INFO : Checking products for sufficient output grid coverage (grid: 'wgs84_fit')... INFO : Resampling to 'wgs84_fit' using 'ewa' resampling... INFO : Computing products and saving data to writers... INFO : SUCCESS
real 15m33.582s user 21m7.667s sys 8m54.846s
Version 2.3
/data/users/kathys/polar2grid_v_2_3/bin/polar2grid.sh crefl gtiff -f /data/users/kathys/version_3_0_timing/input_modis_china INFO : Initializing reader... INFO : Could not find any existing crefl output will use MODIS SDRs to create some ..... INFO : Saving true color image to filename 'grid_wgs84_fit_true_color.dat' INFO : Creating output from data mapped to grid wgs84_fit INFO : Creating geotiff 'aqua_modis_modis_crefl04_500m_20210318_060500_wgs84_fit.tif' INFO : Creating geotiff 'aqua_modis_modis_crefl03_500m_20210318_060500_wgs84_fit.tif' INFO : Creating geotiff 'aqua_modis_modis_crefl01_500m_20210318_060500_wgs84_fit.tif' INFO : Creating geotiff 'aqua_modis_modis_crefl01_250m_20210318_060500_wgs84_fit.tif' INFO : Creating geotiff 'aqua_modis_true_color_20210318_060500_wgs84_fit.tif' INFO : Processing data for grid wgs84_fit complete
real 3m8.224s user 2m25.329s sys 0m41.183s
/data/users/kathys/polar2grid_v_2_3/bin/polar2grid.sh modis gtiff -f /data/users/kathys/version_3_0_timing/input_modis_china INFO : Initializing reader... .... INFO : Processing data for grid wgs84_fit complete
real 5m14.793s user 3m58.525s sys 1m16.368s
In fairness, I forgot to include the creation of the false color image when executing my Version 2.3 crefl command.
@kathys Could you get me the timings for the 2.3 false color generation and try running 3.0 generating the products in groups (all vis/ir in one group and true/false in another).
Interesting results. Version 3.0 true/false color and IR/VIS and GeoTIFF images done separately.
Running Polar2grid v3.0 true/false color modis image processing at Wed Mar 2 22:58:07 UTC 2022 /data/users/kathys/polar2grid-swbundle-20211219-125805/bin/polar2grid.sh -r modis_l1b -w geotiff --num-workers 4 --fill-value 0 -p true_color false_color -f /data/users/kathys/version_3_0_timing/input_modis_china INFO : Sorting and reading input files... INFO : Loading product metadata from files... INFO : Running day coverage filtering... INFO : Running night coverage filtering... INFO : Computing dynamic grid parameters... INFO : Checking products for sufficient output grid coverage (grid: 'wgs84_fit')... INFO : Resampling to 'wgs84_fit' using 'ewa' resampling... INFO : Computing products and saving data to writers... INFO : SUCCESS
real 10m20.211s user 15m10.958s sys 5m27.454s
/data/users/kathys/polar2grid-swbundle-20211219-125805/bin/polar2grid.sh -r modis_l1b -w geotiff --num-workers 4 --fill-value 0 -p bt20 bt21 bt22 bt23 bt24 bt25 bt27 bt28 bt29 bt30 bt31 bt32 bt33 bt34 bt35 bt36 vis01 vis02 vis03 vis04 vis05 vis06 vis07 vis26 -f /data/users/kathys/version_3_0_timing/input_modis_china INFO : Sorting and reading input files... INFO : Loading product metadata from files... INFO : Running day coverage filtering... INFO : Running night coverage filtering... INFO : Computing dynamic grid parameters... INFO : Checking products for sufficient output grid coverage (grid: 'wgs84_fit')... INFO : Resampling to 'wgs84_fit' using 'ewa' resampling... INFO : Computing products and saving data to writers... INFO : SUCCESS
real 7m48.336s user 10m12.059s sys 3m27.965s
Finished Polar2Grid v3.0 processing at Wed Mar 2 23:16:16 UTC 2022
Versioin 2.3 true/false color and IR/VIS GeoTIFF images done separately.
Running Polar2grid v2.3true/false color modis image processing at Wed Mar 2 22:48:27 UTC 2022 /data/users/kathys/polar2grid_v_2_3/bin/polar2grid.sh crefl gtiff --true-color --false-color -f /data/users/kathys/version_3_0_timing/input_modis_china INFO : Initializing reader... INFO : Could not find any existing crefl output will use MODIS SDRs to create some ... INFO : Processing data for grid wgs84_fit complete
real 4m16.376s user 3m19.179s sys 0m57.272s
Running Polar2grid v2.3 bt/vis modis image processing at Wed Mar 2 22:52:44 UTC 2022 /data/users/kathys/polar2grid_v_2_3/bin/polar2grid.sh modis gtiff -f /data/users/kathys/version_3_0_timing/input_modis_china
INFO : Processing data for grid wgs84_fit complete
real 5m23.714s user 4m1.390s sys 1m22.397s
So doing them separately for 3.0 is...way slower. Great. And I think you forgot fog
so it should be even slower. Something else is going on here. By that I mean there must be something dumb in the Satpy reader that is just dragging the entire processing down.
How does generating all the products in 4m15s sound?
aqua_modis_true_color_20210318_060500_wgs84_fit.tif
:
real 4m15.378s
user 8m32.510s
sys 1m35.956s
There are more optimizations that can be made, but this is the first batch. This execution also included all of the other issues you brought up (saturated pixels, band 21 might still be the wrong enhancement, etc).
How does it sound? Beauteous!!!!!!
Sadly I can't seem to reproduce that 4m15s after I removed all the hacks and tricks I was doing. I'm currently averaging about 5m30s for all the products. More optimizations to come I hope.
So a couple findings from our meeting yesterday and some testing I did afterward. It seems my laptop is much faster than bumi. I get similar results to Kathy when I run on bumi with the tarball (~8m for true/false). When processing all products with the 3.0 tarball it was ~14m on bumi. When processing with 8 workers instead of 4 and with the input data in /dev/shm
it was still 13m9s.
On my laptop, processing all products took about 6m30s. With some of the newer optimizations that aren't included in the tarball I get 5m40s. If I run the 3.0 tarball on my laptop I get 6m47s for all products. If I run the 2.3 tarball on my laptop I get 2m56s for true/false color. For 2.3 tarball on my laptop with all the regular band products I get 3m30s. I will have to use these new times as my benchmark. I'll note though that on my laptop at least these timings are pretty close, not faster, but not 2x+ slower either.
One thing that came to mind is that the new processing is producing a true/false color with the 500m MODIS input bands instead of the 1km input bands that P2G 2.3 was using (by accident). I should try generating a 1km version and see how it performs.
I should also check TC's monitoring of bumi and see if the disk is getting overloaded and can't handle all the parallel reads and writes.
Ran all products with true/false starting from 500m: 6m21s on my machine. Ran all products with true/false starting from 1lm: 5m59s on my machine
So a difference, but not much. This is not the main source of performance issues.
$ PYTROLL_CHUNK_SIZE=6000 time polar2grid.sh -r modis_l1b -w geotiff --num-workers 4 --fill-value 0 -p true_color false_color bt20 bt21 bt22 bt23 bt24 bt25 bt27 bt28 bt29 bt30 bt31 bt32 bt33 bt34 bt35 bt36 vis01 vis02 vis03 vis04 vis05 vis06 vis07 vis26 -f /data/satellite/modis/input_modis_china/MYD0*
This essentially tries to run the processing as close to one single chunk as possible. It ran in 5m35s (versus 6m30s). This is still results in multiple chunks for the output grid (I'll try again to avoid that). This is very interesting. Almost a minute of execution saving just by using more memory.
Edit: This is very confusing. This processing should have been using some of the interpolation optimizations so it should have been closer to 5m40s (what I quoted in the above comment for running all products on my system).
Edit 2: Yeah running with default chunk size got me 5m36s. So...I guess chunk size doesn't matter much.
So I did some basic tests here is some more timing information.
Default modis true/false color as of latest tarball. This uses the poor chunking strategy of the geotiepoints interpolation and nothing else special. It does not include any other optimizations.
real 5m48.814s
user 13m41.834s
sys 1m36.803s
With geotiepoints doing smarter chunking of the CVIIRS/angle-based interpolation (on optimize-modis-interp
geotiepoints branch):
real 4m46.628s
user 10m41.394s
sys 1m59.493s
Using only P2G 2.3-style map_coordinates interpolation (no other optimizations):
real 3m47.945s
user 9m8.382s
sys 1m15.434s
Add ability in Polar2Grid to "persist" (compute interpolated lon/lats once and then hold them in memory) along with the map_coordinates interpolation from above. This essentially removes the overhead of interpolating multiple times as the lon/lats are stored in memory and computed once for day/night filtering, once for dynamic grid freezing, and once for actual resampling. I'm debating whether or not this should be done for all swath-based processing or only more complex ones like modis.
real 2m59.513s
user 6m55.871s
sys 1m1.015s
Just for fun, I did all the products on my laptop like the above with the persisted lon/lats and got:
real 4m28.289s
user 9m12.405s
sys 1m24.544s
So I just started using /data/dist/polar2grid-swbundle-20220611-202644.tar.gz
which uses the persisting trick of #472 and I ran it on bumi. As a reminder v2.3 on bumi gets:
True and false color:
real 4m16.376s
user 3m19.179s
sys 0m57.272s
All band products:
real 5m23.714s
user 4m1.390s
sys 1m22.397s
The previous beta took ~14m to do all products (true/false/bands) on bumi. The new version gets:
real 11m47.786s
user 18m46.450s
sys 17m35.809s
So better, but still terrible compared to v2.3. Just bands:
real 6m45.294s
user 9m7.736s
sys 8m31.066s
Just true/false color:
real 8m42.232s
user 14m16.692s
sys 13m44.493s
I got fed up with this and tried forcing the use of "legacy EWA" which is still relatively dask friendly but requires loading all the data into memory at once. This does all normal bands in 5m21s and true/false in 6m7s. So even that is not fast enough, but it is a pretty easy guess to say the crefl python algorithm is slower than the crefl C algorithm; yet another thing that could be sped up.
Edit: Just reran all band products with 2.3 on bumi and it was 5m40s.
Running this data to grid 204 with v2.3 is 4m11s, and 3.0 beta is 4m31s.
I have a hacked version of main
running on bumi (where I copied the changed code from #483 into the previous tarball) and I now get these timings:
Regular band products:
real 4m40.545s
user 6m7.102s
sys 4m42.599s
Only true/false color:
real 6m15.633s
user 10m49.273s
sys 9m6.608s
Running all band products and true/false:
real 9m14.641s
user 15m4.571s
sys 12m8.886s
Running v2.3 on bumi again just to be sure shows all band products are produced with:
real 5m27.238s
user 4m8.161s
sys 1m18.116s
Since that is almost the same as the last time I ran it, I don't really feel like running true/false color with v2.3 right now. We'll assume the same timing of 4m16s.
In summary:
So we're not done, but this is much better.
With the newest 20220707-161841 tarball which includes GIL release fixes for EWA, the above modis china processing case for all bands + true + false finishes in 6 minutes with 8 workers. This is 3m45s faster than v2.3 doing two calls (one for bands + one for true/false).
@kathys still reports slower times than v2.3 when run separately on bumi. This is currently only with VIIRS testing. The VIIRS case has not been checked for basic optimizations yet (ex. optimal chunking in the reader).
Edit: Doing just bands took 2m26s. Note this is still using data files on /dev/shm on bumi.
Edit 2: True and false color by themselves take 4m26s. This is just a little slower than v2.3. But this is also the 500m-based composite recipe. Version 2.3 had a bug that was always using the 1km low resolution bands as input.
Edit 3: Running all products but pointing to /data instead of /dev/shm ran in 5m45s. That's 15 seconds faster than using /dev/shm :man_shrugging:
As pointed out on slack, specifying PYTROLL_CHUNK_SIZE=6400
before the P2G command reduces the execution time by about 50%. In @kathys's tests with VIIRS data with 10 granules, doing this chunk size showed:
v2.3:
true/false only: 4m5s all regular bands: 10m4s
v3.0: true/false only: 2m17s all regular bands: 2m47s all products together: 4m37s
This means smarter chunking and possibly per-reader chunk sizes are the next important step to ensure good performance.
I don't remember what we said in our last meeting, but I remember saying overall that processing speeds are faster. It does still seem that true_color/false_color may be slower than it used to be but there are many factors that could be leading to that. The biggest one is likely (I think) that higher resolution bands are being used in the true_color/false_color composites than in v2.3.
Closing. We can reopen if needed or open a new one if it seems like a different issue that is causing the speed changes.
I wrote scripts to time the creation of all P2G MODIS GeoTIFF default bands for a 15 minutes pass using the default WGS84 dynamic grid. For Version 2.3 P2G I added the times together that it took create the default images for crefl (true and false color) and for the vis/ir bands. For P2G version 3.0, I used 4 workers, which I think is the default. Here are the results:
P2G Version 2.3: 8m22s P2G Version 3.0: 15m33s
It took almost twice as long using more CPU's to create the images using P2G v3.0 than it did with V2.3 on the machine bumi. I cannot imagine releasing software until this is improved. Dynamic grids are the way the software is most used by Liam and I in SSEC, and I suspect most used by the community too.