google / go-jsonnet

Apache License 2.0
1.61k stars 230 forks source link

Feature request - support for remote imports #432

Open aglover-zendesk opened 4 years ago

aglover-zendesk commented 4 years ago

It would be great if Jsonnet could natively resolve remote imports, rather than be limited to importing local files. In its current state, a user must use a workaround like defining all of their Jsonnet in a monorepo, or using a secondary tool like jsonnet-bundler or vendir to "vendor" the Jsonnet files locally first.

It would be great if we could extend imports.go to support remote imports. This could either be an extension of the default FileImporter and it's associated functions, or a new importer type. Extending the default importer would be preferred since it's the default importer in cmd.go today, otherwise users would have to change that one line and maintain their own fork/build.

I started working on a quick-and-dirty implementation for resolving imports over HTTP, but this became very GitHub-specific once I added auth support. I'd be happy to develop this feature and submit a PR, but I'm hoping the maintainers could provide feedback on (a) whether you support/reject the concept of remote imports in general and (b) preferences on protocol (HTTP vs git vs other) before I invested more time in this.

sbarzowski commented 4 years ago

Thanks for bringing this up and creating a proof of concept! Below are my early thoughts:

Before we get to the heart of the issue, I'm going to warn you that this is going to go slowly. Sorry about that, but this is a sort of thing which affects how people use Jsonnet and we would need to support it pretty much forever. So any change like that will require extensive discussion, asking many what-ifs etc. And the issue is pretty deep IMO.

The most important problem with current imports (from my perspective) is that there is no really official convention for what the paths should look like, so a library written with one path style (~one importer) cannot be easily used in another. This is the downside of being completely unopinionated about paths. It is a problem for generic libraries – what convention should they use for their dependencies? So eventually I would like to have some universal package path convention, which is good enough for everyone (it may even involve new syntax like dotted imports). Your proposal at least partially addresses the issue by making the default importer more widely usable.

An important thing about the Jsonnet language is that we avoid hard dependency on other systems. In principle it should be possible to create an implementation of Jsonnet which works on a coffee machine. So whatever we do, we need to keep the data fetching abstract – we cannot mandate using any particular protocol. We can define a structure of the path, and we can provide a default way to fetch stuff, but we shouldn't mandate on the spec level that it has to be https or git or something. We could have a convention about where to find certain paths and how to fetch it, but it should still be possible for the end-user to provide it somehow "manually" (e.g. because they don't have Internet connection on the machine).

Another thing about Jsonnet is that there are multiple implementations and they all should support the same default path format, so adding anything overly complicated is a problem in itself.

sbarzowski commented 4 years ago

BTW what is the issue with jsonnet-bundler? It's the closest thing we have to an official solution, and it actually created a reasonable convention for the paths.

sbarzowski commented 4 years ago

And just to be clear – we support user-provided importers. So feel free to use your version (you'll need to use Jsonnet as a library, though).

aglover-zendesk commented 4 years ago

Thanks for the speedy response @sbarzowski

I'm going to warn you that this is going to go slowly... So any change like that will require extensive discussion, asking many what-ifs etc.

No worries. I agree that it is a significant change in behavior and has to be considered carefully.

An important thing about the Jsonnet language is that we avoid hard dependency on other systems. In principle it should be possible to create an implementation of Jsonnet which works on a coffee machine... we cannot mandate using any particular protocol

Actually that's our use case, we're running Jsonnet on a Jura Giga W3 Professional Superautomatic Espresso Machine.

Nah I'm kidding, I just like to keep GH issues light. But I understand and agree that unnecessary dependencies should be avoided, especially network dependencies. Ideally Jsonnet could do both - I'm hoping support for remote imports and the existing local import mechanism are not mutually exclusive.

Would a reasonable implementation infer what type of protocol to use? In my throwaway POC, I check if the import has a scheme of http or https. If not, it assumes the path is a local filesystem path and follows the current behavior. We could do the same thing for git:// or ssh:// or whatever, with local file paths being the default option if no matching scheme/protocol is found. Something like this psuedo-code:

url, err := url.Parse(importedPath)
switch importType := url.Scheme; {
  case importType == 'https':
    // Import via HTTP
  case importType == 'git':
    // Import via git
  default:
    // Import via current FS path behavior
}

Or do you feel that inferring protocol is too implicit? Alternatively we could use full URIs for everything, so all paths would be in the form of <proto>://<path>. This would give uniform paths like file://my/local/file.libsonnet, https://domain.com/file.libsonnet, git://domain.com/org/repo/file.libsonnet. This would be a breaking change unfortunately

Yet another alternative - we could differentiate in the stdlib itself, providing something like importgit for remote imports, import for local imports.

Another thing about Jsonnet is that there are multiple implementations and they all should support the same default path format, so adding anything overly complicated is a problem in itself.

Understood. Honestly I have no idea how complicated resolving git imports in C++ is, so I'll look to y'all (maintainers) for guidance on what is/isn't practical.

BTW what is the issue with jsonnet-bundler?

We're actually evaluating jsonnet-bundler now. Only complaint is that it fetches whole directories rather than single files (AFAICT) and transitive dependency resolution requires metadata (the jsonnetfile.json) in the upstream lib. Besides that, the criticisms are more about vendoring in general - end users don't want all the extra Jsonnet libs in their repo, don't want to manage/bump dependencies for another language, etc. We may move forward with jsonnet-bundler/vendir in the interim.

Random thought - resolving imports natively would also expose the possibility of generating dependency graphs that refer back to the authoritative source.

sh0rez commented 4 years ago

I'd be very interested in seeing an official take on remote dependencies from Jsonnet itself.

At Grafana Labs, we use Jsonnet to configure our entire infrastructure and just recently discussed the challenges package management imposes:

/cc @malcolmholmes @duologic

Being a maintainer of jb, I am probably biased towards it, but I need to acknowledge that it did not receive the amount of care from us it probably should have, compared to other tools like Tanka (both in terms of stability, code-quality and feature-set).

I'd be really happy to see something that is more tightly integrated into Jsonnet, similar to the usage flow of go mod (ie. it just works (tm) until you need more control, then you have it). I'm sure the other jsonnet-bundler maintainers would support such a development as well /cc @metalmatze @brancz @tomwilkie

Duologic commented 4 years ago

As @sh0rez explained, we've been discussing this at Grafana Labs.

I personally like the usage flow of go mod, with jsonnetfile.lock.json as an equivalent of go.mod/go.sum for versioning. And I like the jb promoted import path: github.com/grafana/grafonnet-lib/grafonnet. It is compatible with the FileImporter and doesn't impose any protocols.

I think it would be interesting to combine these concepts, move the tasks now executed by jb into a importer and expand this importer so that it can work with well-known import paths (github.com/...).

The process could go something like this:

  1. Does import/file exist on disk? -> use this
  2. Is the import defined in a jsonnetfile? (can provide versioning/protocol) -> fetch this to disk, go to 1.
  3. Is it a supported well-known import path (github.com/...)? -> fetch this to disk, add to jsonnetfile, go to 1.
  4. Not found anywhere? -> throw error.

One step further, we could have a well-known path start with ssh/1.2.3.4/path/to/lib and still be able to fallback to the FileImporter. But that might be a bit of a stretch.

sbarzowski commented 4 years ago

Wow, that's a lot of insights. Thanks everyone!

Would a reasonable implementation infer what type of protocol to use? In my throwaway POC, I check if the import has a scheme of http or https. If not, it assumes the path is a local filesystem path and follows the current behavior. We could do the same thing for git:// or ssh:// or whatever, with local file paths being the default option if no matching scheme/protocol is found. Something like this psuedo-code:

url, err := url.Parse(importedPath)
switch importType := url.Scheme; {
  case importType == 'https':
    // Import via HTTP
  case importType == 'git':
    // Import via git
  default:
    // Import via current FS path behavior
}

Or do you feel that inferring protocol is too implicit? Alternatively we could use full URIs for everything, so all paths would be in the form of <proto>://<path>. This would give uniform paths like file://my/local/file.libsonnet, https://domain.com/file.libsonnet, git://domain.com/org/repo/file.libsonnet. This would be a breaking change unfortunately

Yet another alternative - we could differentiate in the stdlib itself, providing something like importgit for remote imports, import for local imports.

Actually I meant, sort of the opposite of this. My idea was to avoid specifying that it's git/ssh/https/whatever in the import path. Have something like import 'community/github.com/sbarzowski/jsonnet-modifiers'. Then it would be importer's or dependency manager's responsibility to know how to handle things in community/github.com.

Random thought - resolving imports natively would also expose the possibility of generating dependency graphs that refer back to the authoritative source.

Sounds interesting, but I don't see why that couldn't be achieved with jb (it would perhaps be slightly more tedious).

Package versioning. As with any other package solution, versioning can create incompatible constraints.

I am not aware of any package manager that really feels like the right solution when it comes to dealing with versioning constraints. I think the right way would involve distinguishing between private dependencies (such that your dependencies don't need to know that you used, e.g. some helper higher-order functions) and exposed dependencies (visible in the API, e.g. some Kubernetes definitions generated by some library that you return). The former kind could be different for each lib, while the latter would need to be resolved at the higher level. But that's more like a research project, not something we can expect to solve current issues without too much friction.

And when you can't really get it right, let's at least make it simple. I think the following approach would work well:

(This is pretty similar to the Go way, isn't it?)

I'd be really happy to see something that is more tightly integrated into Jsonnet, similar to the usage flow of go mod (ie. it just works (tm) until you need more control, then you have it).

+1

I think it would be interesting to combine these concepts, move the tasks now executed by jb into a importer and expand this importer so that it can work with well-known import paths (github.com/...).

The process could go something like this:

  1. Does import/file exist on disk? -> use this
  2. Is the import defined in a jsonnetfile? (can provide versioning/protocol) -> fetch this to disk, go to 1.
  3. Is it a supported well-known import path (github.com/...)? -> fetch this to disk, add to jsonnetfile, go to 1.
  4. Not found anywhere? -> throw error.

I like the flow. I'm not sure how the upgrades would work with that, though. And I'm not sure if we really want the importer to do all of that.

What do you think about creating a really seamless experience with jb – a single command which looks into the Jsonnet code for remote paths, finds and downloads all transitive dependencies? Maybe even without any metadata in the dependency repos.

This could be very good first step, even if the importer is the end goal. We can design the community namespace and verify the usability without being slowed down by the issues of multiple implementations and strict portability standards. I'll be happy to help with that (especially the Jsonnet import traversal, which I have figured out already for other purposes).

end users don't want all the extra Jsonnet libs in their repo

I imagine we could have them downloaded on demand and saved to a hidden dir?

aglover-zendesk commented 4 years ago

For reference, what I was envisioning is a process similar to what Dhall has implemented. For the end user, the remote import doesn't have to be committed to their local repo (although it can be cached on disk in a generic location as well). Every time they run Dhall, the remote import is fetched (or fetched from cache if it's been fetched before). So anything not in the Jsonnet runtime directly would be a compromise of sorts.

What do you think about creating a really seamless experience with jb – a single command which looks into the Jsonnet code for remote paths, finds and downloads all transitive dependencies? Maybe even without any metadata in the dependency repos.

If we can't do it all within the jsonnet runtime, this seems like the next best option.