pepkit / pepdbagent

Database for storing sample metadata
BSD 2-Clause "Simplified" License
2 stars 1 forks source link

Improve project update method #133

Open khoroshevskyi opened 2 weeks ago

khoroshevskyi commented 2 weeks ago

project update is super not efficient now. We should create new way of storing and saving sample, so we won't update whole project.

One of the ideas is linked list in sql

khoroshevskyi commented 6 days ago

SQL changes:

---- Add guid to the table
-- ALTER TABLE samples ADD COLUMN guid TEXT UNIQUE;
-- UPDATE samples
-- SET guid = id -- Replace 'default_value' with an appropriate value
-- WHERE guid IS NULL;

-- ALTER TABLE samples
-- ALTER COLUMN guid SET NOT NULL;

-- ALTER TABLE samples ADD COLUMN parent_guid TEXT ;

-- ALTER TABLE samples
-- ADD CONSTRAINT fk_parent_guid
-- FOREIGN KEY (parent_guid) REFERENCES samples (guid);

-- UPDATE samples AS child
-- SET parent_guid = (SELECT guid FROM samples AS parent WHERE parent.row_number = child.row_number-1 and parent.project_id =child.project_id);
khoroshevskyi commented 5 days ago

to update database in batches

WITH cte AS (
    SELECT id
    FROM samples
    WHERE guid IS NULL  -- add your condition here
    LIMIT 500000
)
UPDATE samples
SET guid = cte.id
FROM cte
WHERE samples.id = cte.id;