Open nleroy917 opened 1 year ago
Ideally, we should have partial updates of all the value in the column. So for the structure that we have right now, we can't change columns e.g. annotation.
Some useful information: "Ideally, you don't use JSON documents for structured, regular data that you want to manipulate inside a relational database. Use a normalized relational design instead." https://dba.stackexchange.com/questions/115825/jsonb-with-indexing-vs-hstore/115849#115849
In this case my proposition is to move keys: last_update
, number_of_samples
to a new columns. Otherwise we shouldn't change this values separately.
About changing the PEPs, I think we should do it using complete peppy object
p.s. partial pep updates is ready, but I think it is really a bad idea to change json values separately.
but I think it is really a bad idea to change json values separately.
I don't think there's any problem to update JSON values separately. Can you explain why you are reluctant?
Would it be easier to supply and API that is more something like this:
values_to_update = {
"description": "This is a new PEP.",
"is_private": False,
"sample_table": "sample_name,file\nfrog1,frog1.txt\nfrog2,frog2.txt"
}
for key, value in values_to_update.items():
pepdbagent.update_project("nleroy917/basic:default", key, value)
There needs to be some logic as to whether or not the key-value pairs belong to the project or the annotation... that can be tricky I understand its not just a simple update statement.
About project dict, it's fine that all the values (config, sample, subsample) are in one place, because it's one project. But if we are talking about annotation, e.g. last_update, It's better to create new column.
I found information that: "From a performance perspective, it hardly matters whether you change a single piece of data inside a JSON object or all of it: a new version of the row has to be written."
Additionally, if we want to update json, the SQL is more complicated, especially if we want to update few columns, or few values from one column.
Sample sql code:
UPDATE projects
SET digest = %s, private = %s, anno_info = jsonb_set(anno_info, '{description}', '%s', '{number_of_samples}', '%s')
WHERE namespace = %s and name = %s and tag = %s;
Ok, we also can send separate requests to update every key individually. But in that case, it will be inefficient.
p.s. I already wrote function to update values partially, but there is still some errors...:
def update_item(self,
update_dict: dict,
namespace: str,
name: str,
tag: str) -> UploadResponse
update_dict – dict with update key->values. Dict structure: { name: Optional[str] namespace: Optional[str] tag: Optional[str] digest: Optional[str] project_value: dict private: Optional[bool] anno_info: dict }
namespace – project namespace
name – project name
tag – project tag
@Khoroshevskyi this is great, and I can integrate this with pephub as it is. Wondering if there is another method for updating things like description
or the project config file or the sample tables?
In v0.7.0 I added sample update/add/delete functionality. So this is partial solution for this issue
To support the
PATCH
requests and partial updating of PEPs on pephub, it would be convenient to have a.update_pep()
function. I can also just execute raw SQL if needed.