mozilla / ActiveData-ETL

The ETL process responsible for filling ActiveData
Mozilla Public License 2.0
1 stars 5 forks source link

Missing latest state of treeherder job #81

Closed klahnakoski closed 4 years ago

klahnakoski commented 4 years ago

This task is marked intermittent, but not in ActiveData:

https://firefox-ci-tc.services.mozilla.com/tasks/UAkCFPWKQECNYDCGocpBKQ

marco-c commented 4 years ago

YaS-vqpPSYCOsJUxUTl4eg, dST2VvJASmS2c4VVCsJB2A and NLbO9LNwSWeTYUmv6UoZxg too.

klahnakoski commented 4 years ago

This is caused by holes in the ingestion:

{
    "from":"treeherder",
    "groupby":"etl.source.id",
    "where":{"eq":{"etl.source.source.id":1966}},
    "sort":"etl.source.id",
    "limit":1000
}
klahnakoski commented 4 years ago

the task was modified Wed, May 20, 18:29:45 (EDT) = Wed, May 20, 22:29:45 GMT

as per

I can not find the task in the range

{
    "from":"treeherder",
    "select":[
        {"aggregate":"count"},
        {"name":"max","value":"last_modified","aggregate":"max"},
        {"name":"min","value":"last_modified","aggregate":"min"}
    ],
    "groupby":"etl.source.id",
    "where":{"eq":{"etl.source.source.id":1966}},
    "sort":"etl.source.id",
    "limit":1000
}

I would expect the task to be found on 1966.263. I checked 263 and 264, but the UAkCFPWKQECNYDCGocpBKQ is not there.

https://active-data-treeherder-jobs.s3-us-west-2.amazonaws.com/1966.263.json.gz https://active-data-treeherder-jobs.s3-us-west-2.amazonaws.com/1966.264.json.gz

klahnakoski commented 4 years ago

https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=862f6454d127fa4d10c6e96995470de0e1e20512&selectedTaskRun=UAkCFPWKQECNYDCGocpBKQ-0

shows Wed, May 20, 18:29:45 which is the GMT date and I found UAkCFPWKQECNYDCGocpBKQ in 1966.208 which was loaded.

Also, I re-ingested the whole day 1966, and there are still holes.

klahnakoski commented 4 years ago

action.start_time==0 which is year 1970, which is too old to index.

klahnakoski commented 4 years ago

push-to-es updated with fixes https://github.com/mozilla/ActiveData-ETL/commit/443a5bac9144417356e2fe11f4f76df0dffd818

SQS filled with requests to reindex treeherder https://github.com/mozilla/ActiveData-ETL/commit/a77297525372ff3374b30978ec72a68458b9dc4c

klahnakoski commented 4 years ago

specific tasks already repaired: https://activedata.allizom.org/tools/query.html#query_id=7VqR+K9i

waiting for reindex to finish

klahnakoski commented 4 years ago

Confirmed all blocks are loaded

{
    "from":"treeherder",
    "select":[
        {
            "value":{"add":{"etl.source.id":1}},
            "aggregate":"max",
            "name":"expected"
        },
        {
            "value":"etl.source.id",
            "aggregate":"cardinality",
            "name":"actual"
        }
    ],
    "groupby":"etl.source.source.id",
    "where":[{"gte":{"etl.source.source.id":1950}}],
    "sort":"etl.source.source.id",
    "limit":1000
}