Closed volodymyrss closed 1 month ago
This has already happened sometime ago, and the reasons why have always been unclear.
Inspecting this job, in both instances, the jobs are "done", but two not-aliased scratch dirs have been created
I suppose it's possible it's a race condition between checking that directory exists and creating it. Do the inspected directories have creation time close to each other?
One is 1725364545.0859604 , and the other is 1725364544.7069209 . So yes, very close
It happened again https://cdci.sentry.io/issues/5935424353/?notification_uuid=085601ab-b6ac-443a-b91e-b00c3f7d038b&project=1467382
And also this time, it looks the consequence of the same type of race condition
Yeah, thanks for following up on it.
We need a way for dispatcher to safely recover from this situation. Could you propose something?
An approach that can help us is based on using a retry mechanism to handle the situation more "gracefully"?
Eventually, we could even implement a lock-based approach: in particular, we'd use a lock file to ensure that only one process can create a directory at a time.
I just did some research, and figured the library fcntl can be used for file-lock functionality.
What do you think?
Actually, I think, an approach that uses a lock is going to be more effective
Lock sounds good to me, thanks.
See details in https://cdci.sentry.io/issues/5791157252/?project=1467382&query=is%3Aunresolved&referrer=issue-stream&statsPeriod=24h&stream_index=0
@burnout87 could you please have a look?