Open chaoran-chen opened 3 weeks ago
(@fhennig is thinking about this page and its counts now so sending it his way)
great! Good timing for this indeed hehe, I'm gonna add it to the other review stuff
I'm still looking into it and I'm gonna test it, but I suspect that this code:
val statusCounts: Map<Status, Int> = Status.entries.associateWith { status ->
baseQuery.count { it[SequenceEntriesView.statusColumn] == status.name }
}
executes a query for each status and it's not in a transaction and so in between queries a sequence might change status and thus the numbers fluctuate. This would mean it's a backend issue. A solution might be to wrap it in a transaction or just refactor it into a single query if possible.(EDIT: The whole class is @Transactional
so it should be fine)
Transactional doesn't necessarily avoid such issues unless the transaction isolation level is strong enough. There might be a direct way to do this in one SQL query rather than multiple, a group by or so.
Let's make this reproducible, then it should be fixable.
There are two options here: a) multiple requests are made and they arrive staggered b) insufficient transaction isolation
B is easy to test: just add a delay in the loop above between each iteration. You can even throw warnings or errors if the total is inconsistent (calculate the total in a separate query).
Then increase transaction isolation level to see if that fixes it.
We likely have a textbook example of a phantom read here. https://www.postgresql.org/docs/current/transaction-iso.html
phantom read A transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction.
We need repeatable read isolation level. Or simply a group by.
The default read committed is not enough:
Also note that two successive SELECT commands can see different data, even though they are within a single transaction, if other transactions commit changes after the first SELECT starts and before the second SELECT starts.
Oh, good find!
Do you want to finish this then?
Not promising I'm working on this, may or may not :)
But here are some results having added a little bit of logging to get-sequences in #3279:
Not only is
val statusCounts: Map<Status, Int> = Status.entries.associateWith { status ->
baseQuery.count { it[SequenceEntriesView.statusColumn] == status.name }
}
being executed sequentially the root cause of the fluctuations. It's also the cause of sequenc entries being slow, because we get all the data just to count rows 🤦 , see https://github.com/loculus-project/loculus/issues/1612
The whole function should be improved:
Then we might be able to close all of these in one go:
I uploaded 50,000 sequences and observed a strange behavior of the total number of sequences:
The total is often higher than 50000. The sum of yet to be processed and processed is equal to a too high total often.
https://github.com/user-attachments/assets/665f23f1-44fc-4c54-ba89-e46055527615
I believe to have seen this before: is there already a ticket? (I guess it's a website bug but it could also be a backend bug.)