terascope / file-assets

Teraslice processors for working with data stored in files on disk, S3 or HDFS.
MIT License
1 stars 2 forks source link

Add prom metrics to s3 operations #996

Closed sotojn closed 6 months ago

sotojn commented 6 months ago

This PR makes the following changes:

sotojn commented 6 months ago

Here is the results after running two jobs that utilize the s3_exporter and the s3_reader. I ran each with two workers, this is the job file that uses the s3_exporter:

{
    "name": "data-to-s3",
    "lifecycle": "persistent",
    "workers": 2,
    "assets": [
        "standard",
        "file"
    ],
    "operations": [
        {
            "_op": "data_generator",
            "size": 10000
        },
        {
            "_op": "s3_exporter",
            "path": "data-folder-1",
            "format": "ldjson"
        }
    ]
}

Here is the metrics that this job produced:

Worker 1

# HELP teraslice_worker_info Information about Teraslice worker
# TYPE teraslice_worker_info gauge
teraslice_worker_info{arch="arm64",clustering_type="kubernetes",name="teraslice",node_version="v18.19.1",platform="linux",teraslice_version="1.6.1",assignment="worker",ex_id="1fb6b686-b1cd-4f01-b7eb-21343099d0aa",job_id="4391d12a-37db-4178-86f0-01a0e3cbb09c",job_name="data-to-s3",pod_name="ts-wkr-data-to-s3-4391d12a-37db-78fb9f4ff6-hzxrz"} 1

# HELP teraslice_worker_slices_processed Number of slices the worker has processed
# TYPE teraslice_worker_slices_processed gauge
teraslice_worker_slices_processed{name="teraslice",assignment="worker",ex_id="1fb6b686-b1cd-4f01-b7eb-21343099d0aa",job_id="4391d12a-37db-4178-86f0-01a0e3cbb09c",job_name="data-to-s3",pod_name="ts-wkr-data-to-s3-4391d12a-37db-78fb9f4ff6-hzxrz"} 43

# HELP teraslice_worker_records_processed_from_s3 Number of records written into s3
# TYPE teraslice_worker_records_processed_from_s3 gauge
teraslice_worker_records_processed_from_s3{class="S3Batcher",name="teraslice",assignment="worker",ex_id="1fb6b686-b1cd-4f01-b7eb-21343099d0aa",job_id="4391d12a-37db-4178-86f0-01a0e3cbb09c",job_name="data-to-s3",pod_name="ts-wkr-data-to-s3-4391d12a-37db-78fb9f4ff6-hzxrz"} 215000

Worker 2:

# HELP teraslice_worker_info Information about Teraslice worker
# TYPE teraslice_worker_info gauge
teraslice_worker_info{arch="arm64",clustering_type="kubernetes",name="teraslice",node_version="v18.19.1",platform="linux",teraslice_version="1.6.1",assignment="worker",ex_id="1fb6b686-b1cd-4f01-b7eb-21343099d0aa",job_id="4391d12a-37db-4178-86f0-01a0e3cbb09c",job_name="data-to-s3",pod_name="ts-wkr-data-to-s3-4391d12a-37db-78fb9f4ff6-knw5l"} 1

# HELP teraslice_worker_slices_processed Number of slices the worker has processed
# TYPE teraslice_worker_slices_processed gauge
teraslice_worker_slices_processed{name="teraslice",assignment="worker",ex_id="1fb6b686-b1cd-4f01-b7eb-21343099d0aa",job_id="4391d12a-37db-4178-86f0-01a0e3cbb09c",job_name="data-to-s3",pod_name="ts-wkr-data-to-s3-4391d12a-37db-78fb9f4ff6-knw5l"} 43

# HELP teraslice_worker_records_processed_from_s3 Number of records written into s3
# TYPE teraslice_worker_records_processed_from_s3 gauge
teraslice_worker_records_processed_from_s3{class="S3Batcher",name="teraslice",assignment="worker",ex_id="1fb6b686-b1cd-4f01-b7eb-21343099d0aa",job_id="4391d12a-37db-4178-86f0-01a0e3cbb09c",job_name="data-to-s3",pod_name="ts-wkr-data-to-s3-4391d12a-37db-78fb9f4ff6-knw5l"} 215000

This wrote 430,000 records into s3.

Job file that uses s3_reader:

{
    "name": "s3-to-es",
    "lifecycle": "persistent",
    "workers": 2,
    "assets": [
        "elasticsearch",
        "file"
    ],
    "operations": [
        {
            "_op": "s3_reader",
            "path": "data-folder-1",
            "size": 10000,
            "format": "ldjson"
        },
        {
            "_op": "elasticsearch_bulk",
            "size": 10000,
            "index": "data-folder-1"
        }
    ]
}

Worker Metrics 1:

# HELP teraslice_worker_info Information about Teraslice worker
# TYPE teraslice_worker_info gauge
teraslice_worker_info{arch="arm64",clustering_type="kubernetes",name="teraslice",node_version="v18.19.1",platform="linux",teraslice_version="1.6.1",assignment="worker",ex_id="19d6660a-79f2-49c9-84a0-fa3b75b5eada",job_id="b1656941-9bd8-4e9d-b551-05c3473e4346",job_name="s3-to-es",pod_name="ts-wkr-s3-to-es-b1656941-9bd8-6b4967c9b8-45tqq"} 1

# HELP teraslice_worker_slices_processed Number of slices the worker has processed
# TYPE teraslice_worker_slices_processed gauge
teraslice_worker_slices_processed{name="teraslice",assignment="worker",ex_id="19d6660a-79f2-49c9-84a0-fa3b75b5eada",job_id="b1656941-9bd8-4e9d-b551-05c3473e4346",job_name="s3-to-es",pod_name="ts-wkr-s3-to-es-b1656941-9bd8-6b4967c9b8-45tqq"} 7918

# HELP teraslice_worker_records_read_from_s3 Number of records read from s3
# TYPE teraslice_worker_records_read_from_s3 gauge
teraslice_worker_records_read_from_s3{class="S3Fetcher",name="teraslice",assignment="worker",ex_id="19d6660a-79f2-49c9-84a0-fa3b75b5eada",job_id="b1656941-9bd8-4e9d-b551-05c3473e4346",job_name="s3-to-es",pod_name="ts-wkr-s3-to-es-b1656941-9bd8-6b4967c9b8-45tqq"} 217367

Worker Metrics 2:

# HELP teraslice_worker_info Information about Teraslice worker
# TYPE teraslice_worker_info gauge
teraslice_worker_info{arch="arm64",clustering_type="kubernetes",name="teraslice",node_version="v18.19.1",platform="linux",teraslice_version="1.6.1",assignment="worker",ex_id="19d6660a-79f2-49c9-84a0-fa3b75b5eada",job_id="b1656941-9bd8-4e9d-b551-05c3473e4346",job_name="s3-to-es",pod_name="ts-wkr-s3-to-es-b1656941-9bd8-6b4967c9b8-5q9lx"} 1

# HELP teraslice_worker_slices_processed Number of slices the worker has processed
# TYPE teraslice_worker_slices_processed gauge
teraslice_worker_slices_processed{name="teraslice",assignment="worker",ex_id="19d6660a-79f2-49c9-84a0-fa3b75b5eada",job_id="b1656941-9bd8-4e9d-b551-05c3473e4346",job_name="s3-to-es",pod_name="ts-wkr-s3-to-es-b1656941-9bd8-6b4967c9b8-5q9lx"} 7737

# HELP teraslice_worker_records_read_from_s3 Number of records read from s3
# TYPE teraslice_worker_records_read_from_s3 gauge
teraslice_worker_records_read_from_s3{class="S3Fetcher",name="teraslice",assignment="worker",ex_id="19d6660a-79f2-49c9-84a0-fa3b75b5eada",job_id="b1656941-9bd8-4e9d-b551-05c3473e4346",job_name="s3-to-es",pod_name="ts-wkr-s3-to-es-b1656941-9bd8-6b4967c9b8-5q9lx"} 212633

This resulted in a new index with a total record count of 430000 records when curling elasticsearch indices:

health status index                      uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   data-folder-1              Z-j6MiSVSbqwpRSE2-obMw   1   1     430000            0    289.3mb        289.3mb
godber commented 6 months ago
sotojn commented 6 months ago

We need to bump the asset version a minor version. Then I can merge this.

I have bumped the asset a minor version