Decide on visualisation tool within demo tutorials

ChristinaLast commented 2 years ago

[ ] @andrewphilipsmith to explore visualisation methods for inline visualisations of tutorial output

kmcdono2 commented 2 years ago

Great. @andrewphilipsmith Would love to review these when you are ready to discuss options!

andrewphilipsmith commented 2 years ago

Below are various thoughts and questions about the visualisation for the tutorial. I'm still checking if the answers already exist in places I've not spotted yet, but I have written them here anyway.

Scope of the tutorial

Are we expecting users to work through the tutorial using their own data or with sample data that we provide?
Does the tutorial cover using MapReader at scale? Applying MapReader to hundreds of map sheets (with overlapping margins) presents different visualisation challenges to applying it to a single example map sheet.
Does the tutorial cover using MapReader for non-map images (e.g. plant phenotype)?

CRS and geometries

Are the result data and the map images available in a common CRS? The MapReader paper mentions reprojecting the map-sheets. So far, I've been pulling the NLS images from their tile server (in Web Mercator) and plotting the patch results in WGS84. Reprojecting that much data on the fly will always be painful, whatever the visualisation tool.
Are the patches available as polygon files, or can they be generated procedurally? (e.g. the inset to (d) in this figure https://user-images.githubusercontent.com/1899856/144105429-f4f02d49-7b2a-4cdb-ae57-19d077aab713.png). The paper mentioned that by default, MapReader uses fixed pixel extent of the map-sheet - so are they actually square in real-world coordinates?

Pre-visualisation processing

Assuming that the csv files are typical of MapReader's output, I think there is a case for doing some post-processing of the result to get them in a format suitable for visualisation (which might be different from a format that is suitable for further analysis).

Some non-exclusive options:

Convert the csv files to a spatially-indexed storage format so that at high zoom-levels only the required points need to be read from disk.
Interpolate a raster image from the points. A reasoned and repeatable rule-based approach could be used for overlapping patches that have been categorised differently. The resulting image could be tiled, so it is suitable for display at multiple scales. (Switching to vector points/patches for high zoom-levels would still be appropriate).
If we (a) exclude the unclassified patches and (b) dissolve the classified patches, then we might have the data in a form when suited for vector tiles, which is a format well suited for displaying large datasets at a range of scale.

Creating a notebook that does this post-processing would be possible (I've made a start on this). Some of the input parameters of MapReader would need to be accessible (notably to the patch size/geometry).

Choice of visualisation tool

I've had an initial play with leafmap, which I'm sure is capable of doing what we require (at least within the tutorial's scope). However, I'd want a better understanding of the issue above before making a final recommendation.

kmcdono2 commented 2 years ago

Are we expecting users to work through the tutorial using their own data or with sample data that we provide?

For a tutorial, it will be with the provided sample data (e.g. the 1-inch OS maps).

But we also need to include clearer instructions for how to prepare input for MapReader when people want to bring their own maps to the tool. E.g. which kinds of maps work best, how many, in what format. This is part of the README update I am planning.

Does the tutorial cover using MapReader at scale? Applying MapReader to hundreds of map sheets (with overlapping margins) presents different visualisation challenges to applying it to a single example map sheet.

It should yes. MapReader isn't really useful for working with 1 map. It's only worthwhile at scale, e.g. more than 200 or so large-scale series maps (though of course this will vary)?

Does the tutorial cover using MapReader for non-map images (e.g. plant phenotype)?

Shortly, I think we will separate the map and the non-map applications for this code. So, the non-map related content will have its own separate tutorial(s). Does that sound right @kasra-hosseini ?

kasra-hosseini commented 2 years ago

@andrewphilipsmith Thanks. Please see my inline comments:

Are we expecting users to work through the tutorial using their own data or with sample data that we provide?

For both maps and plant images, we have sample data. In the former, the user retrieves maps via webservers, while in the second use case, we have some simple plant images stored on the repo.
I think the demos/tutorials should make it clear that the user can, of course, use their own data.

I just saw @kmcdono2 's reply. I totally agree with this:

But we also need to include clearer instructions for how to prepare input for MapReader when people want to bring their own maps to the tool. E.g. which kinds of maps work best, how many, in what format. This is part of the README update I am planning.

Does the tutorial cover using MapReader at scale? Applying MapReader to hundreds of map sheets (with overlapping margins) presents different visualisation challenges to applying it to a single example map sheet.

This is a very good question. Currently, we retrieve 4 maps via NLS TileServer. So no, in the tutorial, we do not talk about how to use MapReader at scale. However, it would be great if our visualization tools could handle both situations, i.e., few maps or hundreds of them.
Here is my suggestion: for the current tutorials, we want a few maps so that a user can go through them quickly. So in this case, we can keep the 4 maps that we have.
However, we can add a new set of notebooks on how to work with MapReader's outputs. In this case, we can, e.g., download data from Zenodo (we will upload all our predictions/results presented in the arXiv paper on Zenodo), and then, we show how to work with the outputs and how to visualize them.
How does it sound?

Does the tutorial cover using MapReader for non-map images (e.g. plant phenotype)?

Yes! We are thinking about how to restructure the README file, but in any case, we will have, at least, two sets of notebooks: working with historical maps and scientific (non-map) images

kasra-hosseini commented 2 years ago

Are the result data and the map images available in a common CRS? The MapReader paper mentions reprojecting the map-sheets. So far, I've been pulling the NLS images from their tile server (in Web Mercator) and plotting the patch results in WGS84. Reprojecting that much data on the fly will always be painful, whatever the visualisation tool.

OK, great question. If the map sheets are in GeoTIFF (e.g., our 1" maps but NOT the maps that we retrieve from TileServer), MapReader can reproject them. In the case of NLS images, we do not reproject the map sheets during training and model inference. However, when we want to work with the outputs (including visualizing them on Kepler), we reproject the points to a common CRS (just to emphasize, we do not reproject the map sheets in this case but the coordinates of the patches).
I am now thinking maybe it is better to reproject the map sheets after retrieving them from NLS. What do you think?

kasra-hosseini commented 2 years ago

Are the patches available as polygon files, or can they be generated procedurally? (e.g. the inset to (d) in this figure https://user-images.githubusercontent.com/1899856/144105429-f4f02d49-7b2a-4cdb-ae57-19d077aab713.png). The paper mentioned that by default, MapReader uses fixed pixel extent of the map-sheet - so are they actually square in real-world coordinates?

After retrieving a map via NLS TileServer, we compute, for example, how many pixels do we need to cover an area of, e.g., 50m x 50m? The number of pixels can vary depending on the resolution of maps, but the covered area should be roughly the same between patches.

so are they actually square in real-world coordinates?

Number of pixels in x and y directions are the same.

kasra-hosseini commented 2 years ago

(Sorry, I have to go to a co-working session now, but I will try to go through your questions in the afternoon)

kmcdono2 commented 2 years ago

I am now thinking maybe it is better to reproject the map sheets after retrieving them from NLS. What do you think?

I think this is a good idea. We want to simplify whatever people need to do to see the data.

However, can we make it easy to accommodate different CRS? E.g. diff map collections?

kasra-hosseini commented 2 years ago

I think there is a case for doing some post-processing of the result to get them in a format suitable for visualisation (which might be different from a format that is suitable for further analysis).

I see. This is an interesting idea.

Convert the csv files to a spatially-indexed storage format so that at high zoom-levels only the required points need to be read from disk. Interpolate a raster image from the points. A reasoned and repeatable rule-based approach could be used for overlapping patches that have been categorised differently. The resulting image could be tiled, so it is suitable for display at multiple scales. (Switching to vector points/patches for high zoom-levels would still be appropriate). If we (a) exclude the unclassified patches and (b) dissolve the classified patches, then we might have the data in a form when suited for vector tiles, which is a format well suited for displaying large datasets at a range of scale.

These are great ideas! Thanks @andrewphilipsmith .

Creating a notebook that does this post-processing would be possible (I've made a start on this).

Great! I am looking forward to seeing how this works in practice.

Some of the input parameters of MapReader would need to be accessible (notably to the patch size/geometry).

OK, and in general, I think we should do a better job in logging the steps (from preprocessing to training and fine-tuning).

kasra-hosseini commented 2 years ago

I've had an initial play with leafmap, which I'm sure is capable of doing what we require (at least within the tutorial's scope). However, I'd want a better understanding of the issue above before making a final recommendation.

@andrewphilipsmith Thank you again and please let me know if I should clarify anything or if I can be of any help.

maps-as-data / MapReader