Closed djhoese closed 2 months ago
As expected if I comment out the html += map_section(area)
portion of the HTML formatting in pyresample the HTML list of areas is generated in less than a second. So purely cartopy map generation here.
Is it possible to generate those maps in parallel?
Good idea. Probably. Right now this all happens as a side effect of sphinx loading the conf.py
configuration file. To make it a little cleaner with fewer unintended consequences we should probably put this in a sphinx extension module to be loaded by sphinx specifically. I've done this for "roles" used in the .rst
files, but never any of the other types of extensions. Actually, now that I say that I wonder if sphinx has a parallel build plugin available...
Ah there is a -j
option to specify sphinx number of jobs.
Just out of curiosity I started looking at what the cartopy/matplotlib code is doing and wanted to see if it could be sped up itself. First I thought of dpi
to make lower resolution images, but we're saving to SVG so that doesn't work. Then I looked at what features were added in the cartopy steps and it is OCEAN, LAND, COASTLINE, and BORDERS. Just to see, I took out the first 3. Although the images are not useful without coastlines, it made the overall documentation generation on my laptop go from 7m17s to 2m24s. I wonder which step is taking the longest. My guess would be the land and ocean steps. Let's see...
Edit: 2m43s with only coastlines and borders and images are much more usable.
Edit 2: 4m51s with coastlines, borders, and oceans (no land).
From our discussion with @BENR0 when he implemented this, the Ocean is problematic for at least the GEOS projection. So I would suggest removing ocean for now.
@mraspaud Ok. I thought he fixed that in one of the recent pyresample PRs though?
Regardless, I did some experiments this weekend with geoviews and bokeh (the backend that geoviews can use). My solution isn't complete yet and is very hacky, but it takes ~3 minutes to render the whole documentation and produces javascript maps instead of SVGs. The maps could be interactive, but I don't think it makes sense for them to be. I might push my work to a branch and leave it alone for a while, but I'd also like to post on the geoviews or bokeh discussion forums and ask for advice on how best to implement what we want. Specifically we have an HTML template for each area that we want to slip into a restructuredtext document (section headers, etc) in our sphinx documentation and where and how files are included gets complicated.
@BENR0 @mraspaud I'm not sure if I should open a PR in pyresample to discuss this, but since I don't have a PR I thought I'd bring it up here. So as mentioned above I have a hacky bokeh version of the maps working, but it doesn't include all the attribute and HTML work done in the current functionality in pyresample. Additionally, adding the bokeh functionality to pyresample would require restructuring how things work. As an alternative, Martin had suggested just turning off OCEAN features in the maps to speed things up, but this would require a new pyresample release anyway to either disable it or add the necessary kwargs to allow for controlling what features are plotted (not currently available). So I'm not sure what effort I should put my limited amount of time toward.
For further details on the bokeh refactoring needed: The best option I've found for bokeh is to basically do this:
for aname, params in area_dict.items():
params = area_dict[aname]
area_def = _create_area_def_from_dict(aname, params)
if not hasattr(area_def, "to_cartopy_crs"):
continue
crs = area_def.to_cartopy_crs()
features = gv.Overlay([gf.ocean, gf.land, gf.borders, gf.coastline])
f = gv.render(features.opts(toolbar=None, default_tools=[], projection=crs, xlim=crs.bounds[:2], ylim=crs.bounds[2:]), backend="bokeh")
areas_bokeh_models[aname] = f
script, divs_dict = components(areas_bokeh_models)
Where the script
in the last line is a single large javascript object with all the plot information and should be placed in the HTMl. The divs_dict
is a dictionary of area name -> HTML "
area_repr
in pyresample/_formatting_html.py
).
This all ignores the fact that really pyresample should probably be including a sphinx extension to generate these things as a new restructuredtext role/directive. For example:
.. pyresample-geometry::
:area_file: areas.yaml
:area_name: msg_something_something
:backend: bokeh
I see the code is doing things like:
ax.add_feature(cartopy.feature.OCEAN)
Would it help to reduce the resolution?
ax.add_feature(cartopy.feature.OCEAN.with_scale("110m"))
I'm not sure what resolution is used by cartopy as the default
As far as I could tell that is the default. I had the same hope.
I've made some major changes to my #2875 with bokeh plotting. It requires the pyresample PR to be released though (https://github.com/pytroll/pyresample/pull/615).
We've been noticing that documentation on readthedocs has been timing out. Off the top of my head I believe the last time I checked RTD gives free projects 15 minutes for a build (30 minutes if you pay). The area preview generation (using cartopy) is taking a little over 4 minutes by itself (that's my local machine, not a cheap CI runner that RTD might be using). On top of that the resulting HTML file (see below) is ~23MB in size because it has all the SVG images embedded. I don't know how others feel but that seems like a lot. I mention the size because my initial idea for fixing the HTML representation time was to just pre-create the HTML of the areas and save it to git and detect when it needs to be updated.
https://satpy.readthedocs.io/en/stable/resample.html#area-definitions-included-in-satpy
Here's the timing on
time make html
on my local machine with the cartopy shapefiles already downlaoded and cached:So the last two prints before the
time
output are the HTML representation writing. So ~66% of the documentation rendering is the area repr. Another big chunk of the remaining 3 minutes is likely API documentation generation.