qri-io / qri

you're invited to a data party!
https://qri.io
GNU General Public License v3.0
1.11k stars 66 forks source link

bug: ambigiuity over what a "run" is #1878

Closed ramfox closed 3 years ago

ramfox commented 3 years ago

This ambiguity is making it more difficult to make design decisions in the collection.

1) what "kind" of run need to be represented in the collection? If a user does a save with the apply flag, should that run id and run status be captured? If so, relying on any kind of workflow or deploy event will not capture this. We should instead rely on the actually transform events. However, as of right now, we can't differentiate between a dry run or a saved run at the transform event level, which means any dry run information would be captured here as well.

2) Even if we are okay leaving out "save w/ the apply flag" type runs, we are not properly capturing any "run status" information in the collection during a deploy. Since we cannot save a dataset without a body or structure, we are forcing any datasets that arrive without an InitID to "apply" their transform on save. To avoid "double runs", we do not call RunWorkflow on any datasets that have been "applied". The way the code is set up right now, however, RunWorkflow events are the most reliable for the collection to use to correctly keep up with run information.

3) We also have 2 systems (the collection & the frontend) that would benefit from knowing in a transform event if the event is a dry run (apply) or a "saved" run (run)

ramfox commented 3 years ago

Does the collection care about dry runs?

Does a user expect to see information in the collection view about the last run that occurred (dry run or otherwise) or only saved runs? I don't have an answer for this @chriswhong @b5 @rgardaphe

If we are okay with seeing the last run information in the collection, regardless of whether or not it was a dry run, then we don't actually have to "solve" the below dry run vs saved run problem.

Frontend will know if a run is a dry run, because it has to request the dry run & will get back the specific run id from the request response. It can use that to select out showing information about that run.

The only snag would be if the user was running qri locally, using the app, and then curling the /auto/apply endpoint. Frontend wouldn't be able to tell in that case that the run happening is a dry run.

dry run vs saved run

Unfortunately it doesn't seem as easy to add an indication that something is a"dry run" vs a "saved run". The transform.Apply treats all runs the same. The only way to know if something is a dry run or a saved run is if the code path that uses transform.Apply saves the dataset version after.

Some options to indicate that we are doing a dry run: 1) lib.Apply emits a ApplyEvent with the runID. Because it's an ApplyEvent, we know the associated run ID is a dry run 2) We can add a "cosmetic" param that we pass to transform.Apply that indicates if the run is a dry run or saved run. We can then surface this to the transform events. This seems... odd.

add InitID to transform/run events

It should be simple to add a dataset InitID to the transform events. In all cases, the "save" path and "run workflow" path should have an InitID by the time the transform apply happens. In the case of a "dry run" (/auto/apply), the only case where we do not have an InitID, is when we do not supply a Ref to the endpoint, which should only happy if the transform we are applying is not associated with an existing dataset. Since the collection would only be collecting information about dataset, if an InitID does not exist in a transform event, we can just ignore it.

WorkflowID isn't as straightforward, since the Save path doesn't have a concept of a workflow. Adding an optional workflow to the save path and to the apply path may not be necessary though, since there is a one to one relationship between datasets and workflows, frontend should be able to rely on the InitID to coordinate.

chriswhong commented 3 years ago

Does the collection care about dry runs?

No, only real runs. Dry runs are for validating your code while you are coding, they don't need to stick around after they provide feedback in the workflow editor.

ramfox commented 3 years ago

Would a user expect to see information about manual runs in the collection? In this case a user would have ran save with Apply=true using the command line or /ds/save endpoint @b5 @chriswhong

chriswhong commented 3 years ago

Would a user expect to see information about manual runs in the collection?

Yes, and they would also want to know that the run was manually triggered (either when saving a workflow or by choosing "run now") versus running as a result of a schedule.

ramfox commented 3 years ago

@b5 did a write up about run "modes" in spec: https://qri.dev/automation/transform-execution/

We currently have 2 modes: "apply" and "commit"

Apply runs run the transform and discard the output Commit runs run the transform and expect the output to be saved afterwards

Any further changes to our ideas about kinds of runs will be found at the above spec site.