qri-io / qri

you're invited to a data party!
https://qri.io
GNU General Public License v3.0
1.11k stars 66 forks source link

feat(orchestrator): `NewTestOrchestratorOptions` #1863

Closed ramfox closed 3 years ago

ramfox commented 3 years ago
orchestrator_test

// NewTestOrchestratorOptionsFromDir loads workflow test cases from the directory. If no `dirPath` is
// given, the workflow test cases will load from the default directory location
func NewTestOrchestratorOptionsFromDir(bus event.Bus, dirPath string) (OrchestratorOptions, []*wftest.TestCase, error)
package wftest

// TestCase is a workflow test case, usually built from a
// directory of files for use in tests
// Each workflow for each test case must have an associated
// dataset, whose `dataset.ID` matches the `workflow.DatasetID`
type TestCase struct {
    // Path to the directory on the local filesystem this test case is loaded from
    Path string
    // Name is the casename, should match the directory name
    Name string
    // Input is intended file for test input
    // loads from input.workflow.json
    Input *workflow.Workflow
    // Expect should match the test output
    // loads from expect.workflow.json
    Expect *workflow.Workflow
    Dataset *TestCase
}

var testCaseCache = map(map[string]*TestCase)

// LoadTestCases loads a directory of case directories
func LoadTestCases(dir string) (tcs []*TestCase, err error)

// NewTestCaseFromDir creates a test case from a directory of static test files
// dir should be the path to the directory to check
func NewTestCaseFromDir(dir string) (tc *TestCase, err error)

// ReadWorkflow grabs a workflow for a given dir for a given filename
func ReadWorkflow(dir, filename string) (*workflow.Workflow, error)

type TestRunner interface {
    Instance() *lib.Instance
    Owner() *profile.Profile
    Context() context.Context
}

// MustSaveDatasetToInstance adds the dataset in the `workflow.TestCase` to the
// instance,  updating the workflow to have the proper `workflow.DatasetID`
func MustSaveDatasetToInstance(t *testing.T, tr TestRunner, tc *TestCase) {
    var err error
    owner := tr.Owner()
    inst := tr.Instance()
    ctx := tr.Context()
    ds := tc.Dataset.Input
    wf := tc.Input
    ref := fmt.Sprintf("%s/%s", owner.Name, ds.Name)

    ds, err = inst.Dataset().Save(ctx, &lib.SaveParams{ Ref: ref, Dataset: ds })
    if err != nil {
        t.Fatalf("error saving dataset to instance: %s", err)
    }
    wf.DatasetID = ds.ID
    if _, err = inst.AutomationOrchestrator().SaveWorkflow(ctx, wf); err != nil {
        t.Fatalf("error saving workflow to instance: %s", err)
    }
}

Our major hurtle here is that in order to be able to call lib.Run successfully, we need the workflow to have an associated dataset that is saved properly to the instance. We also need that associated dataset.ID to be updated on the Workflow, so that we can properly fetch and run the Workflow via the Workflow.DatasetID. I'm proposing we use this wftest.MustSaveDatasetToInstance function that a test runner can use to make sure everything is aligned properly.

ramfox commented 3 years ago

Ended up having to go an entirely different route!

Since we need to be able to save a dataset in order to get an init id, we need to save the dataset first before saving the workflow, since the init ids need to match.

Rather than adding an orchestrator options whose workflow store is already populated with workflows (that will not be able to be associated with any datasets at the point where the orchestrator options are created), we instead take a wftest.TestRunner (an interface that ensures we can get to the Owner, Instance, and WorkflowStore, in order to be able to align everything properly so we can save a valid workflow/dataset pair), and call wftest.MustAddDefaultWorkflows or wftest.MustAddWorkflowsFromDir after the instance has already been created.

dustmop commented 3 years ago

Regarding the dependency problem between datasets, workflows, and initIDs: so initIDs are in actually generated by logbook. We shouldn't need to perform a full run of the save path to get one. Perhaps a better way to structure this would be to get an initID from logbook, by using some convenience method just to reserve it, and then pass that to the workflow deployment to avoid an unneeded save. That doesn't have to happen in the linked PR, but it might be a useful future restructuring in order to simplify this relationship.

ramfox commented 3 years ago

That's interesting.

I think in most cases, for testing, we will expect the workflow to be able to run, and for that we need a fully saved dataset anyway. Durning a run, we grab the transform from a dataset that we expect to already exist, so we would need to save anyway.