neicnordic / sda-download

SDA Data Out API
GNU Affero General Public License v3.0
1 stars 0 forks source link

retrieve files with status only ready #365

Closed blankdots closed 9 months ago

blankdots commented 9 months ago

in the database the latest status might not always be the ready status, this restricts that so that we retrieve files with status ready.

even though the file-mapping is done, when retrieving the file it is good to only get files in status ready indifferent from when that was recorded.

pontus commented 9 months ago

Looks fine as such, but just to check; there's no way we can go from ready to a state where the file should no longer be accessible? (I'm not sure what e.g. deprecate means for access.)

blankdots commented 9 months ago

Looks fine as such, but just to check; there's no way we can go from ready to a state where the file should no longer be accessible? (I'm not sure what e.g. deprecate means for access.)

we have disable for that (https://github.com/neicnordic/sensitive-data-archive/blob/main/postgresql/initdb.d/01_main.sql#L122) ... but first the dataset needs to be deprecated (https://github.com/neicnordic/sensitive-data-archive/blob/main/postgresql/initdb.d/01_main.sql#L157)

Edit:

ah I see what you are saying, this solution might not be ideal :thinking: - I will put it back in draft

blankdots commented 9 months ago

i suspect we have this because use assume the events happen in an order, but from what i see from the db it gives me verified as the latest event instead of ready - so i think we need to have another approach to this query for the sync pipeline

pontus commented 9 months ago

It may be a bug I don't see right now, but in theory it seems to me that's where the ORDER BY comes in - we should probably respect access if the latest event (first if sorted by some timestamp descending) is ready or enabled. But maybe that state check shouldn't be in SQL but in go land?

But did I understand correctly that you're seeing behaviour consistent with the query not returning the latest event?

blankdots commented 9 months ago

discussed in the chat and we will solve this by refactoring the queries and making use of the file_references and dataset_references tables as well as the sync use case will need to wait for file verification before marking the files as ready