monarch-initiative / monarch-app

Monarch Initiative website and API
https://monarchinitiative.org/
BSD 3-Clause "New" or "Revised" License
18 stars 6 forks source link

Extract API access logs from cloud logging #604

Open kevinschaper opened 9 months ago

kevinschaper commented 9 months ago

The v3 stack is currently exporting access log requests to cloud logging, but we aren't yet capturing and saving them. We need a job that will extract these log entries, maybe keeping the original json and additionally making a simple tsv.

Here is an example of a request log entry from nginx:

{
  "insertId": "5tw0k2f1dv2ax",
  "jsonPayload": {
    "message": "10.0.0.4 - - [21/Feb/2024:23:25:39 +0000] \"GET /HGNC:4388 HTTP/1.1\" 200 3715 \"-\" \"Mozilla/5.0 (compatible; SemrushBot/7~bl; +http://www.semrush.com/bot.html)\" \"185.191.171.15, 35.208.191.193\"",
    "instance": {
      "id": "3080415805568172463",
      "zone": "us-central1-a",
      "name": "monarch-v3-2024-02-13-api"
    },
    "container": {
      "name": "/monarch-v3_nginx.1.ckimzxg8x2opcd1akmq6ri7tx",
      "imageName": "us-central1-docker.pkg.dev/monarch-initiative/monarch-api/monarch-ui:latest@sha256:f7999b96e0032cedaff00558b659c3af49e3eb7c32f8835a42529d3e301f1a3c",
      "imageId": "sha256:4c3452dd8588457aaf47d79cf3dd43b38c33ec0188e1509cccfbaf282595669a",
      "created": "2024-02-16T21:55:59.354429946Z",
      "id": "12b78464e839bed34ac883f2e308d2dd8fed9a8511a269aa0dbc61d1d2c4e061"
    }
  },
  "resource": {
    "type": "gce_instance",
    "labels": {
      "instance_id": "3080415805568172463",
      "project_id": "monarch-initiative",
      "zone": "us-central1-a"
    }
  },
  "timestamp": "2024-02-21T23:25:39.559712928Z",
  "logName": "projects/monarch-initiative/logs/gcplogs-docker-driver",
  "receiveTimestamp": "2024-02-21T23:25:40.573835547Z"
}

Here's another from the the API:

{
  "insertId": "15wc0gsf8cuct6",
  "jsonPayload": {
    "message": "2024-02-21 23:33:59.001 | INFO     | monarch_py.api.middleware.logging_middleware:dispatch:26 - Request URL: http://10.128.0.3/v3/api/entity/MONDO:0007038 | Method: GET",
    "container": {
      "name": "/monarch-v3_api.1.6tuzpht3x3sz7oeli3920ot9g",
      "created": "2024-02-16T18:06:33.679679859Z",
      "imageId": "sha256:b3a3064d2c728edef61b3ec190dd749b206c1af093865fff8e9309010fe67144",
      "imageName": "us-central1-docker.pkg.dev/monarch-initiative/monarch-api/monarch-api:fd3abacf4d9a441b8ca297360904e92d8bf2f5f0@sha256:3a45319cfa37113d7994d6e4fd7944a1772e4e39d57fa8600205522ba647f17b",
      "id": "4b81a33101fbde4e13b217e83e44028fac7103ff856fea1182e0988e11d6b546"
    },
    "instance": {
      "zone": "us-central1-a",
      "name": "monarch-v3-dev-api",
      "id": "2831133215052659990"
    }
  },
  "resource": {
    "type": "gce_instance",
    "labels": {
      "zone": "us-central1-a",
      "project_id": "monarch-initiative",
      "instance_id": "2831133215052659990"
    }
  },
  "timestamp": "2024-02-21T23:33:59.002438100Z",
  "logName": "projects/monarch-initiative/logs/gcplogs-docker-driver",
  "receiveTimestamp": "2024-02-21T23:34:00.019243752Z"
}

That we have rotating hostnames might turn out to make this challenging. I'm not worried about differentiating "beta" access from "production" access, but we will need to sort out pulling logs from any VM that matches a pattern.

Since we have request logs for both the UI and the API, it makes sense to capture both, but in separate files.

monicacecilia commented 6 months ago

Dear @amc-corey-cox - because we were thinking about this earlier today, I'd like to ping here and remind you of this ticket. Do you think we can make this a reality for this release cycle? Thanks! 🌷

amc-corey-cox commented 6 months ago

Some tools to look at:

Tableau, Domo, Hotjar