opencb / opencga

An Open Computational Genomics Analysis platform for big data genomics analysis. OpenCGA is maintained and develop by its parent company Zetta Genomics. Please contact support@zettagenomics.com for bug report and feature requests.
Apache License 2.0
164 stars 97 forks source link

Use Azure Monitoring for basic log and metrics collection #1016

Open lawrencegripper opened 5 years ago

lawrencegripper commented 5 years ago

As we start doing performance testing on the solution having insight into CPU, Memory and Network usage will be need to tune and tweak.

The easiest option looks to be enabling the container monitoring solution in Azure Monitoring however we'll need to do some testing to make sure this fits the bill. Also we will likely want to get stats from the HBase nodes, Solr and Mongo

Links:

lawrencegripper commented 5 years ago

I've updated this to include some more data, looking at this work item now I'm starting to feel like this is a rather large bit of work.

lawrencegripper commented 5 years ago

Also does the following limitation present an issue for using Log Analytics with the solution @marrobi

A Log Analytics workspace is currently supported in the following regions:

    West Central US
    East US
    West Europe
    Southeast Asia1
marrobi commented 5 years ago

Portal disagrees.

image

lawrencegripper commented 5 years ago

https://docs.microsoft.com/en-us/azure/azure-monitor/insights/vminsights-onboard#log-analytics

Worth raising on the doc?

martinpeck commented 5 years ago

Currently dealing with timing issues trying to install agents and other stuff. @lawrencegripper has suggested approach that might get around this.

lawrencegripper commented 5 years ago

One approach would be to run the Log Agent from within a docker container. Here is an example of going this from Kubernetes mounting the log files into the container so it can access them. https://github.com/lawrencegripper/azure-aks-terraform/blob/master/oms/oms.tf

marrobi commented 5 years ago

So using:

wget https://raw.githubusercontent.com/Microsoft/OMS-Agent-for-Linux/master/installer/scripts/onboard_agent.sh && sh onboard_agent.sh -w <YOUR OMS WORKSPACE ID> -s <YOUR OMS WORKSPACE PRIMARY KEY>

In cloud-init logs I see:

 82650K .......... .......... .......... .......... .......... 75%  119M 5s
 82700K .......... .......... .......... .......... .......... 75%  214M 5s
 82750K .......... .......... .......... .......... .......... 75%  157M 5s
 82800K .......... .......... .......... .......... .......... 75%  167M 5s
 82850K .......... .......... .......... .......... .......... 75%  189M 5s
 82900K .......... .......... .......... .......... .......... 75% 25.8M 5s
 82950K .......... .......... .......... .......... .......... 75% 50.6M 5s
 83000K .......... .......... .......... .......... .......... 75% 3.13M 5s
 83050K .......... .......... .......... .......... .......... 75% 27.4M 5s
 83100K .......... .......... .......... .......... .......... 75% 29.5M 5s
 83150K .......... .......... .......... .......... .......... 75% 40.4M 5s
 83200K .......... .......... .......... .......... ...        75% 43.2M=14s

2019-01-22 23:01:38 (5.61 MB/s) - Read error at byte 85241040/112281798 (Connection reset by peer). Retrying.

--2019-01-22 23:01:39--  (try: 2)  https://github-production-release-asset-2e65be.s3.amazonaws.com/43709699/4009ab00-e10d-11e8-9798-991dfd11b98b?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20190122%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20190122T225622Z&X-Amz-Expires=300&X-Amz-Signature=2b4a640dabb7132620c23ab1ecb0732c8a8e40f6199705a50870369eb79ae6d7&X-Amz-SignedHeaders=host&actor_id=0&response-content-disposition=attachment%3B%20filename%3Domsagent-1.8.1-256.universal.x64.sh&response-content-type=application%2Foctet-stream
Connecting to github-production-release-asset-2e65be.s3.amazonaws.com (github-production-release-asset-2e65be.s3.amazonaws.com)|52.216.165.139|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2019-01-22 23:01:39 ERROR 403: Forbidden.

If run interactively downloaded and installed without issue. Will troubleshoot.

marrobi commented 5 years ago

curl works, or sudo docker run --privileged -d -v /var/run/docker.sock:/var/run/docker.sock -v /var/lib/docker/containers:/var/lib/docker/containers -e WSID="your workspace id" -e KEY="your key" -h=`hostname` -p 127.0.0.1:25225:25225 --name="omsagent" --restart=always microsoft/oms might be better

marrobi commented 5 years ago

Need to verify correct functionality including:

marrobi commented 5 years ago

Have emailed people to ask about HDInsight, Ambari works, but doesn't look like performance data is coming through to azure.

marrobi commented 5 years ago

Lack of HD Insight metrics confirmed as a known issue. Awaiting update from engineering.

martinpeck commented 5 years ago

blocked until we get a fix/feedback from product group

marrobi commented 5 years ago

This is now fixed as far as data showing, witnessed working today. A couple of performance counters are still missing, but believe I should be able to fix by adding missing values to log analytics data sources in the ARM tempalte.

image

lawrencegripper commented 5 years ago

Nice that's going to be useful during testing! Are we good to close this one out now?

marrobi commented 5 years ago

Let me try fix the missing counters in the image above. Will try do that today.

marrobi commented 5 years ago

Still missing data: image Awaiting response from product team.