usnistgov / dioptra

Test Software for the Characterization of AI Technologies
https://pages.nist.gov/dioptra/
Other
207 stars 29 forks source link

[BUG] Web UI entrypoints error after queue deletion #531

Closed mtrapnell-nist closed 1 month ago

mtrapnell-nist commented 1 month ago

Describe the bug When using the web UI, deleting a queue that is in use by an entry point causes a failure in the entrypoints endpoint. This is a fatal error for the frontend since the deleted queue cannot be restored and entry points can no longer be retrieved while it is in this state. This same issue is persistent on the create jobs page under an experiment when trying to add an entry point from the drop down menu.

Both of these scenarios cause an error to be displayed, which is included below in "Additional context."

To recover from this error, the dioptra-dev.db had to be deleted, followed by running dioptra-db autoupgrade to start again from a clean state.

To Reproduce Steps to reproduce the behavior:

  1. Login to a user account
  2. Navigate to the Queue tab and create a queue
  3. Navigate to the Entry-Points tab and select that queue when creating an Entry Point
  4. Navigate back to the Queues tab and delete the queue
  5. Navigate to the Entry-Points tab and the error below is displayed, and the entry point is no longer listed

Expected behavior The queue should no longer show up in the entry point after the queue is deleted. All created entry points should still be retrievable.

Desktop (please complete the following information):

Additional context This is the error that is displayed from the issue described above when navigating to the Entry-Points tab and when clicking the entry point drop down menu when creating a job: Not Found - The requested queue does not exist. You have requested this URI [/api/v1/entrypoints/] but did you mean /api/v1/entrypoints//tags/ or /api/v1/entrypoints//tags or /api/v1/entrypoints//draft ?

keithmanville commented 1 month ago

I was able to replicate this bug. This issue occurs because deleting a queue places a delete lock on the queue resource, but it does not remove any resource dependency associations with other resources. When that Entrypoint resource is requested, it will attempt to retrieve the latest snapshot of the queue which is not found. I believe we will see a similar issue in other cases where a resource is deleted that has a resource dependency relationship with another resource.

I propose fixing this issue by:

  1. removing the queue/entrypoint relationship by setting resource.parents = [] in QueueIdService.delete
  2. adding checks for resource.is_deleted in the Entrypoint services where children are retrieved, to avoid an error for the user even if the resource dependency table ends up in a bad state.
  3. adding a new test to check this behavior

Once this is fixed for the Queue/EntryPoint resource dependency relationship, we need to replicate this issue for other parent -> child relationships and implement similar fixes:

jkglasbrenner commented 1 month ago

Closed by #535