Closed Robin5605 closed 1 month ago
Currently, it's possible that we get multiple requests to the get job endpoint that all return the same job: https://github.com/vipyrsec/dragonfly-mainframe/blob/c4749c2aa678339b831063bc56f7b34917a6bed4/src/mainframe/endpoints/job.py#L44-L58
To remedy this, we should lock the row while selecting using a FOR UPDATE SKIP LOCKED clause. This will ensure that we don't get duplicate packages returned.
FOR UPDATE SKIP LOCKED
We want something like this:
cte = ( select(Scan) .where( or_( Scan.status == Status.QUEUED, and_( Scan.pending_at < datetime.now(timezone.utc) - timedelta(seconds=mainframe_settings.job_timeout), Scan.status == Status.PENDING, ), ) ) .limit(batch) .options(joinedload(Scan.download_urls)) + .with_for_update(skip_locked=True) .cte() )
Currently, it's possible that we get multiple requests to the get job endpoint that all return the same job: https://github.com/vipyrsec/dragonfly-mainframe/blob/c4749c2aa678339b831063bc56f7b34917a6bed4/src/mainframe/endpoints/job.py#L44-L58
To remedy this, we should lock the row while selecting using a
FOR UPDATE SKIP LOCKED
clause. This will ensure that we don't get duplicate packages returned.We want something like this: