salish-sea / acartia

Open source web3 code underlying the Acartia data cooperative for sharing animal location data in real time
https://acartia.io
MIT License
5 stars 4 forks source link

Integration with iNaturalist #19

Open scottveirs opened 1 year ago

scottveirs commented 1 year ago

iNaturalist has a significant amount of presence data for many marine species within Acartia's geographic domain (SRKW range and the Salish Sea). For example these are killer whale locations across the NE Pacific:

Screenshot 2023-07-25 at 9 36 56 AM

Would it be possible to ingest these data within Acartia for resiliency and easier programatic access, including regional dashboards and analyses?

Would iNaturalist like to ingest Acartia data?

What collaborations are possible with iNaturalist projects like these?

  1. https://www.inaturalist.org/projects/salish-sea-biodiversity
  2. https://www.inaturalist.org/projects/marine-life-of-the-salish-sea (started 2015)
  3. https://www.inaturalist.org/projects/the-great-salish-sea-bioblitz (July 2020)
  4. https://www.inaturalist.org/projects/marine-biodiversity-of-the-pacific-northwest
rainhead commented 1 year ago

I don't believe iNaturalist wants to ingest Acartia data in bulk, but I can certainly imagine Acartia users wanting to post specific observations to iNaturalist for confirmation or posterity.

As far as pulling data from iNaturalist, that shouldn't be a problem. If you're interested in just a few species in the Pacific Northwest, we can probably pull that quite frequently. Data are also exported to GBIF every 1-2 weeks, which would be a good source for backfill data. Either way, observations are often licensed for use with attribution, sometimes for only non-commercial use.

For some sense of volume:

Hopefully that gives some broad sense for what's possible. To dig in, it would be good to have specific ideas of motivations for the integration. Perhaps:

What are your goals?

scottveirs commented 1 year ago

Thanks for the volumetric estimates, @rainhead ! As of today, Acartia holds about 33k records, so it seems to me a backfill with iNaturalist data would be a significant increase (~tripling) but not unacceptable.

The long-term, over-arching goal is to provide open programmatic access to marine animal location data with clear governance and provenance, despite the exploding number of data sources (e.g. Facebook/WhatsApp/Discord/etc groups, plus more and more nature observation apps). Acartia partners have started with real time applications in mind (e.g. SRKW risk mitigation schemes) and visualizing/modeling movement of cetaceans.

But cooperatives like the Puget Sound Ecosystem Monitoring Program have troves of historic data that are in disparate data bases and disk drives... A best example that I know of is (Joe Evenson's WDFW aerial survey data](https://wdfw.wa.gov/species-habitats/at-risk/species-recovery/seabirds/surveys-winter-aerial) which I think also gets submitted to GBIF.

So yes, backfilling as a way to initiate the process of aggregating historic data is a short-term goal for iNaturalist (or just GBIF?) data exchange(s). Once we have enough historic data, folks like @liu-zoe have already started building tools for looking at spatio-temporal trends.

In fact, @liu-zoe , Peter's helped me realize that a bonus of the orca-salmon visualization could be to add adult Chinook data points to Acartia as you scrape them from the Albion and Bonneville web tables!

Peter, do you think it makes most sense to backfill from GBIF the points with licensing that allows incorporation into Acartia's CC-BY-SA records? I wonder how much your estimates from iNaturalist would increase if the same bounding box was used with GBIF marine data (or OBIS-SEAMAP?)...

rainhead commented 1 year ago

I couldn't search GBIF for all Whippomorpha, but the occurrence data for Delphinidae in approximately that bounding box are contributed by these data sets):

Any GBIF data should be OK to use in Acartia, with attribution as required. Attribution might require changes to the observation popover, and perhaps to other screens.

iNat, Happywhale, and observation.org data probably make sense to just merge with Acartia data. Other historical datasets, with observations just in particular time frames or regions, might be confusing to the user without changing the UI to orient around the datasets themselves. That doesn't seem worthwhile at this point, unless you think they tell some story that would be particularly valuable to Acartia's users.

liu-zoe commented 1 year ago

@scottveirs We can certainly fold the Chinook data we scrape from Bonneville and Albion to Acartia! I don't really know who maintains Acartia these days, but I'm happy to give pointers on how to get the scrapped data or adapt my scraping scripts to whoever picks it up next :)

scottveirs commented 1 year ago

@liu-zoe You've motivated a valuable new question for me: where should code to scrape & aggregate data for a local ecoregion live?

In the long run, it seems like it should live close to the data cooperative (e.g. in the acartia repository or salish-sea organization), rather than in an application like the orca-salmon dashboard you've pioneered. That way, not only could your initial application access the salmon time series, but so could any future application (like the game Val @veirs imagined where humans and algorithms try to predict where SRKWs will go tomorrow).

Perhaps when you migrate your repo to the salish-sea organization we could discuss such architectural strategies, along with how new ecological visualization apps should be deployed? @aalaydrus is currently working on how/when to migrate the Acartia deployment from Digital Ocean to Orcasound's AWS account. I wonder if a similar devops approach could work for your long-term deployment of your app and/or scraping scripts, and any integration code @rainhead might consider for iNaturalist, GBIF, or other overarching systems... or if there is a more distributed approach that would be smarter (i.e. the Acartia app deployed as a peer manages scraping from centralized sites and then distributed storage via OrbitDB).

scottveirs commented 1 year ago

@rainhead Thanks for the Delphinidae data size computations! It's both exciting and daunting to see totally novel data sources for a group I know well. (I didn't know that killer whale observations were coming in through Happy Whale! Or that the Burke had so many marine mammal specimens, including a Steller sea cow!)

My gut says it's worth pulling historic data from the GBIF source you listed in to Acartia and building exploratory, analytic, visualization, and story-telling tools around the aggregated local species occurrence, track, and/or count data sets, with ecological context coming in from other feeds (e.g. oceanographic conditions, stream flow gauges) -- at least historic, but eventually also real time and predictions. But a revision of the Acartia data scheme is almost certainly in order before doing so... (You've gotten me reading more about the Darwin Core standard (Wieczorek et al. 2012) and using the World Register of Marine Species (WoRMS) data for local classifications.)

An assumption motivating such an effort is that it would enable almanac-like perspective on news-worthy events in the Salish Sea ecosystem. While I'm still exploring the existing data summary and mapping tools within the iNaturalist or GBIF web sites...

Image

...I'm also noticing that historical context feels valuable and meaningful when reporting on or interpreting a real time event. A good example is this After the Breach podcast from June 2023 in which they discuss both historic sightings of a pair of male Bigg's killer whales to contextualize a recent observation of unusual behaviors. I have similar questions about the T65A matriline that has been scouring Puget Sound this last week (most of the southernmost locations in the current view of Acartia data, first in south Puget Sound a week ago, yesterday into Hood Canal and there still this morning):

Image

What if the goal of a new open source analytic tool -- the Salish Sea almanac? -- was to monitor and model the present ecosystem, detect emerging or upcoming events (real time or seasonal/historic), and generate open data products like stories that are programmatically accessible to other applications? It could have a dashboard with a distillation of overall conditions and daily/seasonal events (a bit like the Old Farmer's Almanac or The Writer's Almanac), but also an API to access data, plots, media...

As a practical example, I was trying to amend my morning new ritual with iNaturalist's iOS mobile app last week. I got close to a way to monitor my local watershed (Ravenna Ravine/Creek) as a way to decide whether to take a walk there with the dog. I found a local project...

Image

...saw that one observer had seen a bunch of species there that I don't yet know how to ID...

Image

...noticed that there was a News tab for the local group...

Image

...and was excited to see a new post just 19 hours old, but then was disappointed to see that it was just a site-wide post not relevant to my local/walkable environment and therefore not helpful in guiding my day's activities or providing insights and meaning about my home ecology.

Image

What I was hoping for was that local expert to have shared some exciting emerging event they had witnessed, or an upcoming event they were excited to see soon.

What do you (all) think?

rainhead commented 1 year ago

What I was hoping for was that local expert to have shared some exciting emerging event they had witnessed, or an upcoming event they were excited to see soon.

I don't engage with projects on the app—or much, in general. They are mostly ways of aggregating observations, e.g. to collect data as part of surveys run by organizations. Projects will very occasionally keep their own journal. The iOS app tab appears to be in error, perhaps vestigial, see https://github.com/inaturalist/INaturalistIOS/issues/654. The app is in the process of being replaced by https://github.com/inaturalist/iNaturalistReactNative, and receives minimal attention.

iNaturalist users can also post to a personal journal. Most people don't, but when experts do, they can be quite insightful. The posts are not particularly discoverable or integrated into the site, not even into site search. Knowledge is mostly shared between people and communities by way of comments on observations and identifications, and through private messages. This severely limits the spread of knowledge, and risks creating games of "telephone", as "tips and tricks" can't easily be traced back to subject authorities. I would love to create better opportunities for community members to learn about taxa from one another (read: better opportunities to create community), but the best ways to do so would be integrated into iNaturalist itself.

What if the goal of a new open source analytic tool -- the Salish Sea almanac? -- was to monitor and model the present ecosystem, detect emerging or upcoming events (real time or seasonal/historic), and generate open data products like stories that are programmatically accessible to other applications?

A cheap, flexible way to prototype something like this would be to set up some Jupyter notebooks on AWS and point them at the GBIF data already in S3. I've done this kind of thing only a little. Do you have experience with Jupyter, Pandas, Spark, or other relevant tools?

rainhead commented 1 year ago

Going back to your questions:

When were they previously sighted within Puget Sound or sub-regions of it? How long did they stay in the region previously compared to this time? What were their movement patterns the last time(s) they were here? How do those patterns relate to known seal and sea lion haulout sites and/or historic occurrence data for their other favorite prey item: harbor porpoises?

Where are the data to answer these questions right now? Is there anything besides Acartia with the necessary granularity of data? How would answers to these questions fit into a larger information landscape, and especially, who besides you has these questions? Perhaps meeting an existing audience where they are means answering fewer, more basic questions, with mostly static answers?

Birds are the only other organisms I can think of with similar requirements for study. They are tracked with radar, people observe them with at least day-to-day time granularity, they're charasmatic enough that serious investment goes into data tracking and sites and apps for birding. Cornell Ornithology provides a good starting place for inspiration.