orcasound / aifororcas-livesystem

Real-time AI-assisted killer whale notification system (model and moderator portal) :star:
http://orcahello.ai4orcas.net/
MIT License
37 stars 25 forks source link

Haro Strait vs Straight #145

Closed dthaler closed 2 months ago

dthaler commented 5 months ago

https://aifororcasdetections.azurewebsites.net/swagger/index.html and various other files show "Straight" whereas the actual name is "Strait". (And neither of them match the live.orcasound.net name for it of "Orcasound Lab".)

scottveirs commented 4 months ago

Since there are already two hydrophone monitoring sites in Haro Strait, another underdevelopment on the Canadian side, and Orcasound plans for another 1-2, I'd 2nd Dave's suggestion to standardize by switching to the "Orcasound Lab" label.

For future-proofing, it would be ideal for the Moderator Portal and other apps using the Swagger API to all utilize the Orcasound API for assigning labels when a new node is integrated in the OrcaHello real time inference system.

micya commented 4 months ago

I feel like the proper fix is to use some kind of unique ID (ex: GUID) for each site and have a single mapping from ID to location string. That would make updates easier in case we decided to rename the location (ex: Haro Strait 1/2/3 or some other more descriptive naming).

dthaler commented 4 months ago

I agree that using a unique ID internally is important, but it's also important to use a consistent display name across sites. Currently https://live.orcasound.net/api/json/feeds has multiple identifiers it provides:

"id" is, I believe, the unique ID for the hydrophone.

"name" is a display name and could be renamed.

"node_name" is a path component used in constructing the S3 path. In theory it could be changed but then one would either have broken URIs or one would have to enable redirects.

"slug" is a path component used in constructing the orcasound.net path. In theory it could be changed but then one would either have broken URIs or one would have to enable redirects.

dthaler commented 4 months ago

There's also a unique ID on dataplicity that is a GUID: "serial": "84bb6fee-b38c-488e-adac-1a36bbb9a4da",

I am not sure why orcasound.net doesn't expose it and instead exposes some other "id".

skanderm commented 4 months ago

That id column (feed_) is unique and is backed by a UUIDv7 (in the orcasite db), which I'd be happy to surface. I could also add a field to keep track of the dataplicity ID, but I assume that represents the hardware node in their system.

Let me know if you'd like either of those changes!

dthaler commented 4 months ago

I think exposing the dataplicity "serial" value would be helpful. That would allow the "slug" value to be renamed in dataplicity without breaking the linkage between orcasite and dataplicity.

paulcretu commented 4 months ago

Great question, so far projects have grown organically and we haven't really formalized a source of truth / data schema for hydrophone node/feed/location metadata. It's something to figure out as more projects spring up.

For the time being, I agree that orcasite (https://live.orcasound.net) is the de facto source of truth. As a primary ID, I would rather rely on our internal ID (feed_) rather than dataplicity's, as our ID should never change (and we generated it). Also we shouldn't couple too tightly to dataplicity in case we ever decide to use a different service. Seems like a great idea to include in the DB though, good call @dthaler and thanks for adding @skanderm.

The other identifiers seem less stable:

slug should be pretty reliable (@dthaler as you mention, it would break URLs) and seems like a good choice for something human readable. I can't imagine why it would ever change without also changing the UUID (feed_) but who knows?

node_name is indeed used for the S3 path, but more than that, it's also the name the hardware reports and uniquely identifies a device. Whenever we get around to improving orcanode code, it may be worth rethinking if the device ID should even be used for the S3 bucket name. In general, I think the idea of a feed/location should transcend the hardware, but of course, decoupling things always comes at a cost of more stuff to track.