Open dustmop opened 4 years ago
We should move fsi under dscache so that it is no longer exposed to lib. This will help simplify our story about handling datasets in repo vs fsi in the same fashion.
Disagree on this. I don't think file system integration is purely the domain of a cache. In #1133, we started to talk about critical path packages organized into an import stack and event driven packages that must be siblings with few/no critical path imports. Full notes here. A lib.Instance
should have an internal fsi
field.
I totally see the impetus to cut down the complexity of all these subsystems interacting, just not sure that we should solve that by stacking imports.
We might be arriving at a place where subsystems need to use events to communicate between each other. The need for callbacks seems to indicate yes, but then we're stuck back at this "how do you know you're finished" problem.
Ok, I thought some more about this, and I have a recommendation. First, agreed that FSI is not in the domain of a cache, totally correct. However, I think the current relationship with lib and fsi is a problem. There's a lot of code duplication like this:
ref, err := base.ToDatasetRef(p.Ref, r.repo, p.UseFSI)
if err != nil {
return err
}
if p.UseFSI {
if ref.FSIPath == "" {
return fsi.ErrNoLink
}
if ds, err = fsi.ReadDir(ref.FSIPath); err != nil {
return fmt.Errorf("loading linked dataset: %s", err)
}
} else {
ds, err = dsfs.LoadDataset(ctx, r.repo.Store(), ref.Path)
if err != nil {
return fmt.Errorf("loading dataset: %s", err)
}
}
this can be seen in Get
in lib/datasets.go and RenderReadme
in lib/render.go and Validate
in lib/datasets.go. This pattern should be extracted and put into a single location that is responsible for all ref resolution.
Relatedly, while looking at the codepath in base/save.go
specifically PrepareDataset
I noticed it is calling CanonicalizeDataset in order to get the previousPath. This is bad for two reasons:
Ideally, as we move to dscache, the base
package wouldn't have a reference to the dscache at all, in order to match the dependency tree I made above.
Here's the proposal:
resolve/
which is very low on the dependency treeresolve/names/
that is a very-high level package, and should only be in lib/
. It wraps dscache and FSI, and handles all resolving of user-specific dataset references by turning them into concrete Datasets in whatever manner is needed. Using the NameResolver cleans up all of the messy duplication that's currently in lib/
.We spoke offline a few days ago, and I don't think we should switch to asynchronous events for things like writing to logbook and updating dscache. All of those activities are intrinsically synchronous and should stay that way. We should use callbacks or, even better, interfaces wherever we can to handle these dependency problems.
We currently have the this dependency chain in our codebase, and it has some issues:
As we move away from repo, and towards dscache, this should be changed to this:
The key takeaways from this better dependency graph:
qri connect
has to start the api. We can do another callback trick here, by having the cmd constructor pass in a callback that can be used to startup the api server.The main motivation for this change is dscache development. A lot of changes lead to circular dependencies that necessitate creating subpackages in dscache. For example: repo knows about dscache, but dscache can also be built from a repo (by combining logbook, refs, and fsi). So we had to move this builder into
dscache/build/from_repo.go
. Also, when developingqri get
using dscache, we want to be able to load datasets from dscache or from fsi within dscache's package, but this leads to a circular dependency (see first chart), so the code has to live instead indscache/loader/loader.go