Closed dioptre closed 2 years ago
Just checking @imranmomin if you have knowledge over this?
@dioptre - I think for some reason the expire_on
is not being set on the job document.
SELECT * FROM c WHERE c.type = 2 AND NOT IS_DEFINED(doc.expire_on)
@imranmomin
I'm working with @dioptre on this. Just wanted to mention that running this query as-is gives an error. I needed to change doc.expire_on to c.expire_on. When I run that, I get 0 results returned.
Let me provide more insight. Let's look at which items still exists where c._ts = '2202-04-01'. This is now past the 36 hours.
SELECT c.type, count(1) as cnt
FROM c
WHERE LEFT(TimestampToDateTime((c._ts - (420 * 60)) * 1000), 10) = '2022-04-01'
GROUP BY c.type
[
{
"type": 8,
"cnt": 20
},
{
"type": 2,
"cnt": 10
},
{
"type": 4,
"cnt": 244719
},
{
"type": 6,
"cnt": 5
}
]
Right away, I can see that type 4 is the major offender. These appear to be stats. Is this not controlled by the same mechanism? I don't see an attribute/column for 'expire_on' for these. Do we need to write our own clean up of the stats? How do we turn it off completely? I'm not sure we are using this at all or want to, especially if it's leaving ~250k entries every day. Ideally, stats should end up in a different location than the items for the application for performance reasons.
{
"key": "stats:succeeded",
"value": 1,
"counterType": 1,
"type": 4,
"id": "719a0938-3e1c-4c29-be50-a6986a07b9cd",
"_rid": "DnBJAIR2F7VDMQEAAAAAAA==",
"_self": "dbs/DnBJAA==/colls/DnBJAIR2F7U=/docs/DnBJAIR2F7VDMQEAAAAAAA==/",
"_etag": "\"d201cb32-0000-0300-0000-6246dad20000\"",
"_attachments": "attachments/",
"_ts": 1648810706
}
Type 6 items are ok. Those refer to a recurring job that was created on this date.
Type 2 items look like the root message/job to execute. You'll notice the expire_on for these is set for 30 days past the created_on.
{
"data": {
"type": "Sourcetable.Domain.Mediation.MediatedMessageConsumer, Sourcetable.Domain.Mediation",
"method": "ProcessAsync",
"parameterTypes": "[\"Sourcetable.Messaging.Abstractions.Message, Sourcetable.Messaging.Abstractions\"]",
"arguments": "[\"{\\\"Id\\\":\\\"8ae429b6d3c840298ce4511a16a0bc0b\\\",\\\"Body\\\":{\\\"$type\\\":\\\"Sourcetable.Domain.DataApi.FivetranSyncTable, Sourcetable.Domain.DataApi\\\",\\\"TableId\\\":\\\"e93c5e4b9e3a4c42aeddd6cc9d5c6583\\\",\\\"OrganizationId\\\":\\\"dfdfbd7bb12a49ef924c3e8614037b3c\\\"},\\\"MaxAttempts\\\":3,\\\"State\\\":{\\\"$type\\\":\\\"Sourcetable.Identity.Abstractions.SourcetableUserAuthState, Sourcetable.Identity.Abstractions\\\",\\\"RequestId\\\":\\\"1d102c8425f64cb49f312dabb224377a\\\",\\\"OriginatingRequestId\\\":\\\"ad81f647db26416f9bb7b426c0e8fb6a\\\",\\\"WorkspaceIds\\\":[],\\\"OrganizationIds\\\":[],\\\"Roles\\\":[]}}\"]"
},
"arguments": "[\"{\\\"Id\\\":\\\"8ae429b6d3c840298ce4511a16a0bc0b\\\",\\\"Body\\\":{\\\"$type\\\":\\\"Sourcetable.Domain.DataApi.FivetranSyncTable, Sourcetable.Domain.DataApi\\\",\\\"TableId\\\":\\\"e93c5e4b9e3a4c42aeddd6cc9d5c6583\\\",\\\"OrganizationId\\\":\\\"dfdfbd7bb12a49ef924c3e8614037b3c\\\"},\\\"MaxAttempts\\\":3,\\\"State\\\":{\\\"$type\\\":\\\"Sourcetable.Identity.Abstractions.SourcetableUserAuthState, Sourcetable.Identity.Abstractions\\\",\\\"RequestId\\\":\\\"1d102c8425f64cb49f312dabb224377a\\\",\\\"OriginatingRequestId\\\":\\\"ad81f647db26416f9bb7b426c0e8fb6a\\\",\\\"WorkspaceIds\\\":[],\\\"OrganizationIds\\\":[],\\\"Roles\\\":[]}}\"]",
"parameters": [
{
"name": "CurrentCulture",
"value": "\"\""
},
{
"name": "CurrentUICulture",
"value": "\"\""
}
],
"created_on": 1648815509,
"type": 2,
"id": "35259346-b060-4a13-83a0-7b345f676667",
"expire_on": 1651407509,
"_rid": "DnBJAIR2F7WsSwIAAAAAAA==",
"_self": "dbs/DnBJAA==/colls/DnBJAIR2F7U=/docs/DnBJAIR2F7WsSwIAAAAAAA==/",
"_etag": "\"d501821f-0000-0300-0000-6246ed950000\"",
"_attachments": "attachments/",
"_ts": 1648815509
}
Type 8 items are related to the actions taken against the type 2 items. Those will delete when the type 2 items delete. However, there a couple that don't have a corresponding existing type 2 item.
The questions that need answering are:
Thanks for the help!!!
@dpachla @dioptre
Thank you for the data.
So whenever the job is being created the default is set to 30 days
.
Once the job completes or has some state the Hangfire.Core
will set the new expiration based on the configuration and looks like the issue is in my library which will not set the new expiry.
Good news, I have been working on couple of fixes and this issue was part of the fix. and will be soon releasing the new package v2.0.0. But this will only fix and new job that are created. I think for old data you will have to run a query to fix the expire_on
Regarding the stats they all should get summarize and deleted and move into counterType: 2
{
"key": "stats:succeeded",
"value": 1,
"counterType": 2,
"type": 4,
"id": "stats:succeeded",
"_rid": "DnBJAIR2F7VXMQEAAAAAAA==",
"_self": "dbs/DnBJAA==/colls/DnBJAIR2F7U=/docs/DnBJAIR2F7VXMQEAAAAAAA==/",
"_etag": "\"d201cb32-0000-0300-0000-6246dad2000X\"",
"_attachments": "attachments/",
"_ts": 1648810707
}
Version 2.0.0 has been released
dotnet add package Hangfire.AzureCosmosDB --version 2.0.0
I hope this new version will fix the issue and bring more stability
Thank you!
Here is the query to check the daily counts for PDT.
SELECT c.state_name , LEFT(TimestampToDateTime((c._ts - (420 60)) 1000), 10) as day , count(1) as cnt FROM c GROUP BY c.state_name , LEFT(TimestampToDateTime((c._ts - (420 60)) 1000), 10)
After clearing the prod Hangfires container on 4/27, everything is working as expected.
We have a 36 hour TTL for the jobs but they aren't being deleted.
Is there anything we should do that's undocumented?
We are using
.WithJobExpirationTimeout(TimeSpan.FromHours(36));
Thanks