malariagen / ag1000g-phase3-data-paper

Other
1 stars 2 forks source link

Re-calc location colours using CIELAB colour space #47

Closed leehart closed 3 years ago

leehart commented 3 years ago

image image image

review-notebook-app[bot] commented 3 years ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

leehart commented 3 years ago

Awesome! Really really neat.

Ninja edit: can we constrain the square a little more, so the edges are sampled populations?

That's one of the things to discuss, because I think it needs to be made clear: The first image is actually using a different bounding box, which means the colours don't match the two collection site maps above, which you can see by eye. The colour maps for the second and third images, which should only differ in style, are actually constrained by a rectangle determined by the min/max lat-longs of the collection sites. I should have shown a different image to the first one, perhaps where random lat-longs are coloured within the same bounding box as the samples. However, this leads to a question of how to communicate the colour map, which I'm still pondering.

There is another question I'm still looking into as well, which determines a variable that adjusts the amount of colour (akin to hue or saturation) beyond the middle grey point, i.e. the extend of the a and b in this colour space, which might need to be configured for different bounding boxes, but I haven't found documentation defining the extents of those dimensions (in this implementation) yet.

alimanfoo commented 3 years ago

The colour maps for the second and third images, which should only differ in style, are actually constrained by a rectangle determined by the min/max lat-longs of the collection sites.

This looks great. I would suggest one minor tweak to this, which is that I would centre the vertical coordinate on the equator. This will lead to a small asymmetry, but I think that's fine. I.e., if the site is above the equator, then the vertical coordinate is (latitude / max_latitude); if the site is below the equator then the vertical coordinate is -(latitude / min_latitude). Does that make sense?

alimanfoo commented 3 years ago

However, this leads to a question of how to communicate the colour map, which I'm still pondering.

Two ideas:

Possibly easiest would be to create a legend with one marker for each major site (i.e., where we have > 10 samples - the ones that get a pie on the sampling map, or possibly even thinning that down if too many markers).

Another option, possibly a bit harder to implement, would be to create a small inset map which is coloured continuously. I.e., something a bit like this:

https://user-images.githubusercontent.com/4256466/97761454-a871a400-1afd-11eb-8e15-09a989fe8a80.png

...but using some kind of color mesh rather than points.

alimanfoo commented 3 years ago

Regarding the horizontal coordinates, if you wanted to get really clever, you could approximate the west coast of Africa somehow, e.g., with a logistic function:

Screenshot from 2020-11-02 12-12-37

I.e., your horizontal coordinate becomes the distance to the West coast, approximated via this function (and centred somewhere sensible). This would allow you to use a little more of the colour space (right now the bottom left quadrant is mostly in the ocean). Although in practice this may not gain us much, given that we only have Gabon and Angola in that region.

hardingnj commented 3 years ago

I.e., your horizontal coordinate becomes the distance to the West coast, approximated via this function (and centred somewhere sensible). This would allow you to use a little more of the colour space (right now the bottom left quadrant is mostly in the ocean). Although in practice this may not gain us much, given that we only have Gabon and Angola in that region.

Not sure I understand this! So a site in coastal Gabon, has the same horizontal value as a site in coastal Gambia?

hardingnj commented 3 years ago

Another option, possibly a bit harder to implement, would be to create a small inset map which is coloured continuously. I.e., something a bit like this:

Yes- this is what I was envisioning... It should be simple though? Just overlay the transparent map over the color grid? We know the bounds of the grid and the lat longs they equate to? Although the map may have to be transformed to have parallel lat/lons.

alimanfoo commented 3 years ago

Yes- this is what I was envisioning... It should be simple though? Just overlay the transparent map over the color grid?

Yes, although you might want to use the land outline to clip the color grid. I.e., mask colours that are in the ocean. I.e., end up with something like this (although with different colours obviously):

image

alimanfoo commented 3 years ago

Not sure I understand this! So a site in coastal Gabon, has the same horizontal value as a site in coastal Gambia?

Approximately, yes. Feel free to reject this idea, I was just trying to think of ways of not waste any colours.

leehart commented 3 years ago

The extreme west and south-west are practically indistinguishable in this space, but fortunately in this case that'll be the South Atlantic. Here's a spot map just to illustrate (the final map will just be around the collection sites and ranged accordingly), and the same can also be seen in the random African scatter plot above (first image). Different human eyes will find different colours indistinguishable, but this one might be universal (medium-lightness cyan versus blue). I might be able to tweak the settings, although I don't expect this will be a problem, just something to be aware of. image

leehart commented 3 years ago

This looks great. I would suggest one minor tweak to this, which is that I would centre the vertical coordinate on the equator. This will lead to a small asymmetry, but I think that's fine. I.e., if the site is above the equator, then the vertical coordinate is (latitude / max_latitude); if the site is below the equator then the vertical coordinate is -(latitude / min_latitude). Does that make sense?

I'm not sure it makes complete sense, unless I misunderstand. The colour space coordinates have a negative and a positive, e.g. southerly latitude is more green/blue while northerly latitude is more red/magenta. Another thing is that any asymmetry would distort the notion that more north of the equator appears more red/magenta to the same degree that more south of the equator appears more green/blue. One way I reckon we could get the best of both worlds (centred on equator plus balanced scales), albeit at the expense of about a quarter of the colours, would be to extend the distance between the northerly-limit and the equator to the same distance as between the southerly-limit and the equator, i.e.:

collection_space equalized_space

Although this makes me think that centering the space on the equator does not give us much, given the costs, when we could just say that the colour space is relative to the collection space. After all, more northerly points are always coloured more red/magenta and less green/blue, regardless of the equator. Orientating on the equator grants it some importance, which might either be true or presumptuous/leading.

leehart commented 3 years ago

Before and after extending the colour map north of the equator, to the same extent as is south. image image ^ E.g. Angola site appears more blue/green (and less grey) because the blue/green space has effectively been stretched to cover more area.

Reference map, clipped to collection box image

leehart commented 3 years ago

image image

leehart commented 3 years ago

Seeing as Mayotte is defining the easterly boundary, it looks like I have a slight misalignment. I suspect rounding. image

leehart commented 3 years ago

Misalignment was due to rounding; remedied by employing NumPy. image ^ southern edge bleed due to use of non-centralized square markers in plotting

leehart commented 3 years ago

image

leehart commented 3 years ago

Just the sample collection site area. This particular colour space isn't centred on the equator. image ^ See the edge of the north-westerly corner near The Gambia / Senegal. See the north-easterly corner in Yemen, missing cutting off some of Somalia / Ethiopia. See the south-easterly corner in Madagascar. This makes me think there might be an aesthetic reason to adjust the colour space range beyond the sample collection site area, e.g. to cover all of Madagascar, even though we don't have any samples there. We'd still cover the southerly tip of Yemen, but could easily mask that out to simplify.

leehart commented 3 years ago

Adjusted geo-colour mapping boundaries:

colour_space_min_lat = -35 # southern tip of Africa
colour_space_max_lat = 20 # as north as the Sahara (roughly)
colour_space_min_long = -26 # west of Cape Verde
colour_space_max_long = 60 # east of Mauritius

geo-colour_mapping_w_locations

A couple of alternative colour space rotations, hiding red in the south west (away from green): alt1_geo-colour_mapping_w_locations alt2_geo-colour_mapping_w_locations

A couple of variables to note:

lightness = 50 # At 1, the midpoint appears black; at 100 it appears white. All points appear white around 700; black around -300.
colour_blend = 70 # At 1, all points appear equal grey, by `lightness`. Above 500, colours regions have harsh boundaries. Below 70, colours are greyish.
leehart commented 3 years ago

I think I actually prefer the {'NE': 'cyan', 'SE': 'green', 'SW': 'red', 'NW': 'magenta'} orientation.

hardingnj commented 3 years ago

I think I actually prefer the {'NE': 'cyan', 'SE': 'green', 'SW': 'red', 'NW': 'magenta'} orientation.

Agreed- I think that looks great!

hardingnj commented 3 years ago

Plus I think the equator and the West coast are the right calls, as discussed yesterday. Seems like we're all good?

leehart commented 3 years ago

Cool. I'm happy to merge this alt1_geo-colour_mapping_w_locations