Add new hydrophones to ML pipeline & document process

scottveirs commented 1 year ago

[ ] Deploy at least general OrcaHello classifier on all new hydrophone locations currently streaming to live.orcasound.net listeners for human detection.
[ ] Document the administrative and/or dev/op procedure for adding a new node to the inference system in the Orcasound administrative wiki

In 2024, we are excited to add the North San Juan Channel hydrophone which was just repaired and restarted streaming last week!

In 2023, the number of active nodes in the network has increased from 3 to these 7 locations ready for production:

The current nodes and some metadata should be accessible by the time of the 2023 Microsoft hackathon programmatically via a new Orcasound API.

micya commented 1 year ago

Steps involved per location:

Add configuration file: see Port Townsend config for reference. Place new file in same directory.
Modify last line of Dockerfile to point to new config (NOTE: we should move away from having to bake the config file into the docker image so that we can build one image and specify the relevant configs externally).
Build docker container: https://github.com/orcasound/aifororcas-livesystem/tree/main/InferenceSystem#building-the-docker-container-for-production
Push docker image to Azure Container Registry: https://github.com/orcasound/aifororcas-livesystem/tree/main/InferenceSystem#pushing-your-image-to-azure-container-registry
Deploy to Azure Kubernetes Service: https://github.com/orcasound/aifororcas-livesystem/tree/main/InferenceSystem#deploying-an-updated-docker-build-to-azure-kubernetes-service (create namespace, secret, deployment)

micya commented 1 year ago

Need to check with @micowan on whether anything needs to be done for moderator portal.

micowan commented 1 year ago

Will have to look at the code again. I know we were looking at putting the hydrophone locations into a config, but don't know if that ever happened. Are the feed turned on for the new ones/are they creating records in the CosmosDB?

From: Michelle Yang @.> Sent: Wednesday, August 2, 2023 6:24 PM To: orcasound/aifororcas-livesystem @.> Cc: Mike Cowan @.>; Mention @.> Subject: Re: [orcasound/aifororcas-livesystem] Add new hydrophones to ML pipeline & document process (Issue #128)

Need to check with @micowanhttps://github.com/micowan on whether anything needs to be done for moderator portal.

- Reply to this email directly, view it on GitHubhttps://github.com/orcasound/aifororcas-livesystem/issues/128#issuecomment-1663050815, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFM7GJAFXX5UZW7TGG2JA6LXTLHQ7ANCNFSM6AAAAAA3ARCUKE. You are receiving this because you were mentioned.Message ID: @.**@.>>

micya commented 1 year ago

From the description, I don't believe the inference system has been brought up yet. So no records in Cosmos DB yet.

No additional handling needs to be done for inference system -> Cosmos DB, since Cosmos DB is really storing a blob of json which accepts any arbitrary string.

catskids3 commented 1 year ago

Checked the code. We did in fact turn the locations into a config setting last go round. So, adding from the UI perspective should be as simple as updating that config with the new locations. I know Scott showed a spreadsheet or api or something during the discussion last week that listed the locations. If they are updating that one themselves, and we can pull from that, we could make the list "live" vs a config setting. But that is just a thought.

scottveirs commented 1 year ago

Hey @micowan et al! I see two possible routes to updating the config file, or more dynamically managing the ML pipeline:

The orcasite wiki lists a recent dump of the feeds table and I could update it this weekend for the hackathon
Recent Orcasound backend improvements make it possible to access the feeds table itself programatically, e.g. here -- https://beta.orcasound.net/graphiql via queries like:

{feeds 
    {nodeName}
}

scottveirs commented 1 year ago

Also @micowan, I mentioned to @skanderm that your existing config file held JSON, so he said he could work on new API endpoint that could provide JSON to you...

skanderm commented 1 year ago

You should be able to get an updated list here: https://beta.orcasound.net/api/json/feeds

You may need to set these headers as well: curl -s -H "Content-Type: application/vnd.api+json" -H "Accept: application/vnd.api+json" https://beta.orcasound.net/api/json/feeds

catskids3 commented 1 year ago

@scottveirs and @skanderm, the url: https://beta.orcasound.net/api/json/feeds was absolutely perfect!

I have already added this to a new hydrophones endpoint in the API so that we can access it from the UI. I also brought in the url and html in case it makes sense to add them to the UI somewhere.

Thanks!!!

skanderm commented 1 year ago

Glad you found it useful! Will the config be modifiable? We’re planning to deploy the changes to https://live.orcasound.net at some point.

catskids3 commented 1 year ago

If I am understanding the question correctly, yes. We will be able to change the URL we are pointing to on the fly by updating the configuration setting in Azure.

catskids3 commented 1 year ago

@scottveirs and @skanderm a quick question, there is a hydrophone location you call Orcasound Lab, can you confirm that this the Haro Strait hydrophone that we reference in the Cosmos DB. And if so, which is the correct name/label? We may need coding/configuration changes on our end if it is "Orcasound Lab".

scottveirs commented 1 year ago

Good question!

Yes they are one and the same. But the official name for that feed is indeed “ Orcasound Lab.” Since there will eventually be more than one hydrophone node in Haro Strait, switching to “Orcasound Lab” would be prudent long-term strategy.

Scott

On Tue, Sep 12, 2023 at 17:47 catskids3 @.***> wrote:

@scottveirs https://github.com/scottveirs and @skanderm https://github.com/skanderm a quick question, there is a hydrophone location you call Orcasound Lab, can you confirm that this the Haro Strait hydrophone that we reference in the Cosmos DB. And if so, which is the correct name/label? We may need coding/configuration changes on our end if it is "Orcasound Lab".

— Reply to this email directly, view it on GitHub https://github.com/orcasound/aifororcas-livesystem/issues/128#issuecomment-1716773787, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADLE3M7ZX5GFLZ57IDUZNDLX2D7DNANCNFSM6AAAAAA3ARCUKE . You are receiving this because you were mentioned.Message ID: @.***>

micowan commented 1 year ago

OK. Great. Since we are changing the partition strategy, which requires as rebuild of the data set, I can take care of that one off during the migration. We will need to speak with @micya or @pastorep about how it is marked coming out of the ML pipeline. Thanks for the feedback and quick turnaround.

micya commented 1 year ago

OK. Great. Since we are changing the partition strategy, which requires as rebuild of the data set, I can take care of that one off during the migration. We will need to speak with @micya or @pastorep about how it is marked coming out of the ML pipeline. Thanks for the feedback and quick turnaround.

I found that location information is hardcoded in the inference system script: https://github.com/orcasound/aifororcas-livesystem/blob/2ed0955d062965b34d0db91b7d93ea1c2fef4e47/InferenceSystem/src/LiveInferenceOrchestrator.py#L36-L40.

We should probably pull that out and configure it via an environment variable.

micowan commented 1 year ago

Michelle. Thanks for finding that. Also, if you are going to be changing the data port, we will want to incorporate the changes I requested earlier. i.e. remove the reviewed and SRKWFound properties (may have these spelled wrong) and replace with a new property called "state" which will be populated with the term "Unreviewed". "state" is also the new partition key. Also need a new property called "locationName" at the top level of the JSON that duplicates the name in the Location portion of the JSON. Thanks.

scottveirs commented 1 year ago

@micowan and @micya --

@salsal97 is teaching David and I here in Redmond how to add the new Sunset Bay location to the ML pipeline.

If the new model deployment creates a candidate, will it show up in the Moderator portal auto-magically now? Or is there some hardcoding of the new location within the UI portal code? (i.e. "Sunset Bay metadata that's now available via the API provided by Skander).

It looks like your recent pull request, Mike, might be the answer my question?

Maybe Tara or someone else who knows C# could review the PR?

catskids3 commented 1 year ago

Once the new API is deployed (which won't be until after the Moderator UI is updated), as long as the location appears in the site Skander provided me, it will show up in the Moderator portal. I cannot speak to the ML pipeline except to say Michelle indicated they had it hardcoded in one of the python scripts. If they could call the new API as well, then it could be removed from there and we would all be pulling from the same place. But I don't know if that is a possibility from the ML side.

On Wed, Sep 13, 2023 at 6:45 PM Scott Veirs @.***> wrote:

@micowan https://github.com/micowan and @micya https://github.com/micya --

@salsal97 https://github.com/salsal97 is teaching David and I here in Redmond how to add the new Sunset Bay location to the ML pipeline.

If the new model deployment creates a candidate, will it show up in the Moderator portal auto-magically now? Or is there some hardcoding of the new location within the UI portal code? (i.e. "Sunset Bay metadata that's now available via the API provided by Skander).

It looks like your recent pull request https://github.com/orcasound/aifororcas-livesystem/pull/131, Mike, might be the answer my question?

Maybe Tara or someone else who knows C# could review the PR?

— Reply to this email directly, view it on GitHub https://github.com/orcasound/aifororcas-livesystem/issues/128#issuecomment-1718408659, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM6IJXCXNFHMONOWWTZJATDX2IZQ5ANCNFSM6AAAAAA3ARCUKE . You are receiving this because you commented.Message ID: @.***>

salsal97 commented 1 year ago

This PR should be a step toward getting this issue squared out https://github.com/orcasound/aifororcas-livesystem/pull/136

skanderm commented 1 year ago

Hi everyone! We've updated the live site. As referenced here: https://github.com/orcasound/aifororcas-livesystem/issues/128#issuecomment-1712531406, please update the endpoint to https://live.orcasound.net/api/json/feeds. Thank you!

micowan commented 1 year ago

@skanderm, Thanks for this. I have replaced the beta url with this new one in the codebase I am working.

tanviraja24 commented 1 month ago

Based off https://live.orcasound.net/listen, are there any new hydrophones available to add?

micowan commented 1 month ago

Scott gave me a URL last year: https://live.orcasound.net/api/json/feeds which has 7 hydrophones listed (including Haro Strait as Orcasound Lab). I have changed the API to pull this list for all Moderator features (picklists, etc.)

scottveirs commented 1 month ago

Before taking the steps that Michelle outlined, we need to fix a change that was recently made to the Amazon S3 buckets where the live audio data are stored. In the process of moving the data streams and archive to Amazon-sponsored buckets (and dramatically reducing our storage and egress costs), we had to rename the streaming data bucket.

The old name of the audio data bucket was streaming-orcasound-net
The new name of the bucket from which OrcaHello should acquire data is audio-orcasound-net

My understanding is that the S3 bucket URI is hard-coded into the Docker images for each location. Ideally, we'd move the audio data source URI/URL outside of the image and into a configuration file.

The other place I see the S3 bucket name is hard-coded is here in the Orchestrator.py code -- https://github.com/orcasound/aifororcas-livesystem/blob/736c864e17dddf2da548ca04fcb16384968c9419/InferenceSystem/src/LiveInferenceOrchestrator.py#L145

Steps involved per location:

Add configuration file: see Port Townsend config for reference. Place new file in same directory.

Modify last line of Dockerfile to point to new config (NOTE: we should move away from having to bake the config file into the docker image so that we can build one image and specify the relevant configs externally).

Build docker container: https://github.com/orcasound/aifororcas-livesystem/tree/main/InferenceSystem#building-the-docker-container-for-production

Push docker image to Azure Container Registry: https://github.com/orcasound/aifororcas-livesystem/tree/main/InferenceSystem#pushing-your-image-to-azure-container-registry

Deploy to Azure Kubernetes Service: https://github.com/orcasound/aifororcas-livesystem/tree/main/InferenceSystem#deploying-an-updated-docker-build-to-azure-kubernetes-service (create namespace, secret, deployment)

orcasound / aifororcas-livesystem

Add new hydrophones to ML pipeline & document process #128