Closed donaldcampbelljr closed 3 months ago
Playing around with different options in PR #178. A POC is working for filebackend. But the results file grows quickly. I am contemplating moving this to a separate .history.yaml
file that is parallel with results.yaml
so that it is less messy.
test_pipe:
project: {}
sample:
pypiperRecordIdentifier1:
number_of_things: 300
pipestat_created_time: '2024-04-04 14:16:54'
pipestat_modified_time: '2024-04-04 14:16:54'
RECORD1:
number_of_things: 50000
pipestat_created_time: '2024-04-04 17:23:56'
pipestat_modified_time: '2024-04-04 18:28:59'
name_of_something: Another_Name
history:
number_of_things:
'2024-04-04 18:28:43':
reported_result: 100
'2024-04-04 18:28:58':
reported_result: 50000
pipestat_modified_time:
'2024-04-04 18:28:43':
reported_result: '2024-04-04 18:28:43'
'2024-04-04 18:28:58':
reported_result: '2024-04-04 18:28:58'
'2024-04-04 18:28:59':
reported_result: '2024-04-04 18:28:59'
name_of_something:
'2024-04-04 18:28:43':
reported_result: Test_Name
'2024-04-04 18:28:59':
reported_result: Another_Name
RECORD2:
number_of_things: 300
pipestat_created_time: '2024-04-04 17:23:56'
pipestat_modified_time: '2024-04-04 18:28:56'
name_of_something: Test_Name_Changed...Again
history:
number_of_things:
'2024-04-04 18:28:45':
reported_result: 100
'2024-04-04 18:28:50':
reported_result: 200
'2024-04-04 18:28:54':
reported_result: 300
pipestat_modified_time:
'2024-04-04 18:28:45':
reported_result: '2024-04-04 18:28:45'
'2024-04-04 18:28:48':
reported_result: '2024-04-04 18:28:48'
'2024-04-04 18:28:50':
reported_result: '2024-04-04 18:28:50'
'2024-04-04 18:28:52':
reported_result: '2024-04-04 18:28:52'
'2024-04-04 18:28:54':
reported_result: '2024-04-04 18:28:54'
'2024-04-04 18:28:56':
reported_result: '2024-04-04 18:28:56'
name_of_something:
'2024-04-04 18:28:48':
reported_result: Test_Name
'2024-04-04 18:28:52':
reported_result: Test_Name_Changed
'2024-04-04 18:28:56':
reported_result: Test_Name_Changed...Again
For now, I'm just continuing with the above approach for the file backend and have added a retrieve_history
function which uses retrieve_one
Currently deletion will look something like this:
name_of_something:
'2024-04-04 18:28:43':
reported_result: Test_Name
'2024-04-04 18:28:59':
reported_result: Another_Name
'2024-04-04 18:59:29':
reported_result: Another_Name
'2024-04-04 20:02:40':
reported_result: Another_Name
'2024-04-04 20:03:28':
reported_result: Another_Name
'2024-04-04 20:05:54':
deletion: ''
However, if the record is removed (this occurs if only the history, creation_time, and modified_time are all that is left), the history is also removed with the record.
Currently working on the db_backend, it appears as though we will also need to delete the history of the record when the primary record is removed (similar to file backend) because of "foreign key contraint"
Could not remove the result from the database. Exception: (psycopg.errors.ForeignKeyViolation) update or delete on table "default_pipeline_name__sample" violates foreign key constraint "default_pipeline_name__sample_history_source_record_id_fkey" on table "default_pipeline_name__sample_history"
However, I'm operating under the assumption that this is desirable anyway.
Originally posted by @donaldcampbelljr in https://github.com/pepkit/pipestat/issues/161#issuecomment-2037737804