marinebon / py-dwc-viz

Python Package for data analysis and visualisation for Darwin Core data, with plug-and-play from providers like OBIS and GBIF.
https://marinebon.github.io/py-dwc-viz/
GNU General Public License v3.0
1 stars 1 forks source link

visualization of taxa distribution (tree?) #2

Closed 7yl4r closed 2 years ago

7yl4r commented 2 years ago

I wonder if there is a better way to visualize a distribution of different taxa. A plot of taxa counts is common (example below).

image

However, taxa fall into a tree structure, and a visualization that includes this nuance would be more informative. Something like:

image

Potentially useful references:

MathewBiddle commented 2 years ago

Not sure if it's applicable here but I've previously used https://github.com/parrt/dtreeviz to create a decision tree.

ayushanand18 commented 2 years ago

I figured out that plotly has some inbuilt support with the ability to define custom paths through sunburst plots.

But the OBIS data has some hierarchies that don't fit in well with all species (a single path which is suited for all can not be found easily), for example, I found that some family has no order but class and subclass. I don't know how to fit it into the plot.

ayushanand18 commented 2 years ago

Like this notebook, the error I get in the end - https://github.com/ayushanand18/obis-research/blob/efe52ace6571bf2052350efb0fe4a72ab8daf6b3/notebooks/florida-keys-taxonomic-dist.ipynb

ayushanand18 commented 2 years ago

I made a temporary fix by replacing 'NaNs' with str(None) but that might not be the correct way to do it. Finally, I get this result for Florida Keys between 1997-2012. image

albenson-usgs commented 2 years ago

You might consider following the GBIF example for how to build these https://www.gbif.org/dataset/9a71aa12-7636-4381-9986-d8f03240b277/metrics "Taxonomic Distribution of Occurrences"

7yl4r commented 2 years ago

You might consider following the GBIF example for how to build these https://www.gbif.org/dataset/9a71aa12-7636-4381-9986-d8f03240b277/metrics "Taxonomic Distribution of Occurrences"

Oooooh. That interactivity is nice. I see they are using highcharts.js to generate that. python-highcharts might be able to help with if we wanted to recreate it exactly, but it doesn't look to be a well maintained repo. Based on the plotly.sunburst docs it looks like those are interactive in the same way. What other inspiration can we take from GBIF's example?

It looks to me like the NaN-to-str(None) approach is working. Let's put this functionality into a method so I can do something like:

dwcviz.taxa_sunburst(pyobis.OccSearch().search())
ayushanand18 commented 2 years ago

Coupling this package with pyobis will be really interesting, and I would really like to do it.

bbest commented 2 years ago

This is all good stuff! The GBIF sunburst plot from @albenson-usgs looks the best!

I love the ability to explore it in all the ways:

  1. On left, hierarchically in outline form.
  2. On right, hovering over sunburst to get full name and occurrence count.
  3. On right, click to dive deeper.
  4. In upper right menu navbar, click to get treemap version.

1 & 2. initial sunburst & outline view

image

3. dive deeper into sunburst

image

4. switch from sunburst to treemap view

image

references

highcharts & gbif

Here's the documentation for:

It'd be worth digging into GBIF repos and asking them about it like here:

... which leads me to:

And I see sunburst peppered throughout their JS and templating code with issues there.

d3

Highcharts is proprietary, but is largely based on the open-source d3, which also has sunburst and treemap, or "icicle", visualizations:

older / other

image

bbest commented 2 years ago

Well done @ayushanand18 with the plotly sunburst of taxonomic coverage in pr #4! 🎉

I just want to document here the visual output of your contribution:

sunfish (Mola mola)

from pydwcviz import taxon
from pyobis.occurrences import OccQuery
occ = OccQuery()

# get the data and the figure object
fig = taxon.plot_dist(occ.search(scientificname = "Mola mola"))

# show the figure
fig.show()

newplot (2)

octopuses (Octopodiformes)

Since the Mola mola example is only for a single species, let's choose a higher level taxa that shows the sunburst breakdown by taxonomic groupings.

For example, Octopodiformes is a superorder of the subclass Coleoidea, comprising the octopuses and the vampire squid

from pydwcviz import taxon
from pyobis.occurrences import OccQuery
occ = OccQuery()

# get the data and the figure object
o = occ.search(scientificname = "Octopodiformes")
fig = taxon.plot_dist(o)

# show the figure
fig.show()

newplot (4)

octopuses (Octopodiformes), first 1000 occurrences (for fast example)

Since Octopodiformes has 52,141 occurrences, let's limit to the first 1,000 occurrences so we can quickly render an example plot.

from pydwcviz import taxon
from pyobis.occurrences import OccQuery
occ = OccQuery()

# get the first 1,000 occurences for the taxa
o = occ.search(scientificname = "Octopodiformes", offset = 0, size = 1000)

# generate the taxonomic distribution plot
fig = taxon.plot_dist(o)

# show the figure
fig.show()

newplot (5)

ayushanand18 commented 2 years ago

Well done @ayushanand18 with the plotly sunburst of taxonomic coverage in pr #4! 🎉

I just want to document here the visual output of your contribution:

Thank you so much @bbest. These illustrations look pretty nice!

I was figuring out a way to visualize the data as a taxon tree without increasing the dependencies much and keeping it simple and fast. I feel plotly treemaps don't hold more insights than sunburst plots, which are easy to comprehend, and thus would add redundancy if we add both. Plotly documents usage of python-igraph for creating trees but I would like to use a package that is well-documented, stable and fast. Would you suggest any?

7yl4r commented 2 years ago

great work! :1st_place_medal: