qri-io / qri

you're invited to a data party!
https://qri.io
GNU General Public License v3.0
1.11k stars 66 forks source link

bug(collection): migration does not fill in all necessary fields #1904

Closed ramfox closed 3 years ago

ramfox commented 3 years ago

Currently the migration of a collection only checks the repo for information.

To get all the necessary data here is where the migration would need to check:

WorkflowID: workflow.Store.Get using the InitID RunCount: logbook.BranchLog filtering for RunModel, adding each RunModel RunStatus: logbook.BranchLog get latest RunModel op. Status found in the op.Note field RunDuration: same op as above , op.Size field RunID: same as above, op.Ref field

b5 commented 3 years ago

So, I'm a little confused as of to why we'd need to include these at all.

If we're thinking in terms of official releases, all of these fields can only be created after the collection migration, as all subsystems described here will be introduced in the same release. But I'm pretty sure the point you're getting at here is a "migration" is also run if a user deletes their collection directory wholesale, correct?

If so, what we're describing here is more of a "recovery" than a migration, and there's real potential for drift: users can create datasets that are not updated in the refstore (we no longer keep it in sync), and in that case recovery will lose those references.

ramfox commented 3 years ago

all of these fields can only be created after the collection migration, as all subsystems described here will be introduced in the same release

Logbook already tracks runid, run status & run duration (and can track run count by iterating through the ops and counting ops with the RunModel type). workflow is the only new subsystem. Which means that except for workflow, the rest makes sense in the context of a migration.

However, yes! I am curious about what our expectations should be around a recovery.