purescript / registry-dev

Development work related to the PureScript Registry
https://github.com/purescript/registry
97 stars 80 forks source link

Backend specific dependencies #20

Closed kl0tl closed 4 years ago

kl0tl commented 4 years ago

Some packages have implicit dependencies on JavaScript packages that aren’t enforced anywhere and must be propagated manually. For instance react-basic unsurprisingly depends on react and react-dom, uuid depends on uuid and uuid-validate, …

At work we circumvent this by installing our packages with both psc-package (for PureScript sources and dependencies) and npm (an empty "files" array in package.json ensuring this only installs JavaScript dependencies). As you can imagine, versions installed by npm frequently differ from those in our internal package set so we hope for the best and things mostly work because we don’t update or JavaScript dependencies often. At least we only have one npm dependency to worry about.

I’d like to propose tracking backend specific dependencies and installers in packages targets to solve this, by giving the following type to backends instead of Optional Text:

{ compile : Optional Text
, install : Optional Text
, dependencies : Map Text Text
}

Spago could then automatically install the backend dependencies with its specified install command (if any).

An alternative would be to define backends as an union so unambiguous fields can be ommited and types can be made as precise as needed per backend:

< JavaScript : { installer < Npm | Yarn >, dependencies : Map Text Text } >

There’s probably better ways to do this and the registry perhaps doesn’t need to take this into account from the start, in any case I’m curious to hear other people thoughts and solutions about that.

hdgarrood commented 4 years ago

I would much rather the registry not take this into account to start with, and perhaps not ever. I think it is very likely that if we tried, we wouldn't quite get it right at first, separate tools would arise to fill the gap, and then it would be too difficult to change the registry to address these problems, so there would just be a half-done implementation inside the registry which we'd have to maintain but which barely anyone actually used. In particular, there is not just one way of retrieving JS dependencies: you can use a CDN, you can install it using one of any number of npm clients (npm, yarn, etc), you can just download a particular version to your server and serve it yourself, etc. I think this design might force us to pick one particular approach, which to me seems too opinionated for a registry. In any case it definitely doesn't feel right to me for a PureScript package in the registry to be able to say "I should be installed via npm/yarn". That's something that should be up to the package consumer. See also, previous sort-of related discussions: https://github.com/purescript/purescript/issues/1850, https://github.com/purescript/purescript/issues/2449

kl0tl commented 4 years ago

There’s indeed various ways to install backend specific dependencies but which version to install isn’t up to the package consumer. This is something that must be documented somewhere and the registry seems like a better place than a project readme for automation.

hdgarrood commented 4 years ago

That’s true - I guess my stance is that it’s not necessarily clear to me that automating this is desirable.

f-f commented 4 years ago

I agree that storing in the registry how to install "native dependencies" might be controversial, but I think that storing which versions of the native dependencies a package depends on provides a lot of value.

I've been annoyed myself by the fact that when I depend on libraries that have npm dependencies I have to hunt for that myself by hand, and there's not really a good way to document that, and so on.

So I'd agree that would be nice to store versions in some nativeDeps key in a Target (or some other similar place), for multiple purposes:

hdgarrood commented 4 years ago

Ok, that's true. If we were able to say "okay package manager, please tell me all of the native dependencies I need and at what versions" I do have to admit that sounds like it would be an improvement.

kl0tl commented 4 years ago

My reasoning for specifying backend specific installers in packages targets was that Spago will need that information for installing backend specific dependencies. Storing them elsewhere would duplicate the targets structure and somewhat defeat the objective of sharing manifests between Spago and the registry. I’m not claiming to understand this clearly though.

If there’s at least agreement on storing backend specific dependencies in the registry is there anything I can do to help?

f-f commented 4 years ago

If there’s at least agreement on storing backend specific dependencies in the registry is there anything I can do to help?

@kl0tl sorry for the delay in getting back, I was setting things up for letting people help with this :slightly_smiling_face:

So, for most of the changes to the registry there are basically two places to patch:

Let me know if this makes sense as a starting point and feel free to ask more questions here, I'd be happy to detail more!

hdgarrood commented 4 years ago

Hm, right, I see. If the idea is that a JS package installer is only used locally, i.e. when you try to build this package when you actually have it checked out, and otherwise (if the package is being built as a dependency of the current project) we only look at the dependencies, perhaps they should be separate keys in the record? I think it's also worth considering options to avoid having to add the installer into the manifest at all, such as, say, having Spago only know about npm, or alternatively by having Spago only update package.json for you, and letting you run npm install or whatever yourself.

I am becoming a little concerned about the manifest file being so tightly tied to Spago. Yes, it would be nice to unify the formats to avoid making package authors create yet another file, but at the same time, if we make it prohibitively difficult for people to talk to the registry without going via Spago, I think that's a problem.

f-f commented 4 years ago

@hdgarrood good points.

If the idea is that a JS package installer is only used locally, i.e. when you try to build this package when you actually have it checked out, and otherwise (if the package is being built as a dependency of the current project) we only look at the dependencies, perhaps they should be separate keys in the record?

I'm not sure how this data structure would look like? Since this information is backend-specific, I think it would make sense to keep it inside the backend block?

I think it's also worth considering options to avoid having to add the installer into the manifest at all, such as, say, having Spago only know about npm, or alternatively by having Spago only update package.json for you, and letting you run npm install or whatever yourself

I considered this but I'm not sure there's a better solution. I.e. people will want to store this info somewhere, should they have a million config files around? Also it's important to note that we don't have to consider only npm here, but other backends as well, and things might work differently, so "running a command" is more flexible than "do hardcoded file manipulation". I think our purpose here is "store the minimal amount of info so that we can integrate all of this".

I guess where I'm getting at is: do we have any reason to not allow people to store this info?

I am becoming a little concerned about the manifest file being so tightly tied to Spago. Yes, it would be nice to unify the formats to avoid making package authors create yet another file, but at the same time, if we make it prohibitively difficult for people to talk to the registry without going via Spago, I think that's a problem.

So, I was very concerned about this at first, but then I considered the following:

Of course I totally agree on not making things hard for people, but do we have a bar for what's "prohibitively difficult"? If you have one in mind then it's great, because we can write it down and say "things before this line are fine, things over this line are not"

hdgarrood commented 4 years ago

At the risk of taking the conversation in a different direction (sorry),

major ecosystems are doing this and things seem to work out great (Haskell: cabal, Rust: cargo, JS: npm)

I would consider Haskell to be the only example of this (with the cabal file format). cargo and npm both use much more common formats for their configuration files (namely toml and json), so you can hit the cargo and npm registries using HTTP and parse the responses without involving anything other than a couple of libraries which are super common anyway (HTTP client and TOML/JSON parser). I can't say exactly where the line is, but I think having to implement the procedure you described in https://github.com/purescript/registry/issues/18 (having to attempt to parse with a number of different types) raises the bar quite considerably for clients. Even just using Dhall gets us closer to that line than most package registries, I think.

f-f commented 4 years ago

I'll briefly answer here, but if you wish to continue this discussion we can move over to #18 or open a new issue altogether.

I think it's actually important that we define where the line is so that we can take sensical choices over time (and not just what feels today), and an easy approach to do that is to try to enumerate use-cases that we'll have, and see if we can do things (or avoid doing things) to make them easier to achieve. E.g. in this issue we're talking about adding some more metadata to the format because it makes life easier for some users. I don't think that adds much hurdle.

If the problem is "uncommon file format", then how can we ease that? There's dhall-to-json, can we make it any easier? Should we have a better dhall-js story? (you can get it today with GHCJS, but it's impractical in the browser. Just today @joneshf expressed some interest in maybe taking a stab at that). If there's no way to improve on this, then is it worth using another format? Why or why not? (note: we don't have to answer all these questions, I'm just exposing my reasoning around this, I thought about it a bunch)

If the problem is "complexity in the clients because they might have to understand multiple datatypes", what can we do about it? If for clients we mean "package managers", this can be done in pure Dhall, so I think that should be easy enough if we write it down in the spec. If instead we think of clients as "whatever commandline thing hitting a HTTP endpoint", then an easy fix for #18 could be to just have a tiny proxy that:

joneshf commented 4 years ago

I also had a concern about using Dhall as the manifest format in the PR to add manifests.

My concern is "uncommon file format." I don't say this to push us away from Dhall. I think that would be a misstep. I stand by my initial push to have package sets use Dhall. It's a great language for stuff like this since it helps democratize package management in a way you cannot easily get with a different format. But, it would be nice if we could consume the manifest as JSON fairly easily.

There are different Dhall implementations, but they vary in their level of support for versions of the standard. We don't have a restriction for what versions of the standard we can use in the registry. Given that dhall-haskell is the only implementation that's always up to date, you're effectively limited to using Haskell to consume the registry. That's fine if you want to write things in Haskell, but not everybody is interested in using Haskell (myself included).

kl0tl commented 4 years ago

@f-f Thank you! I get the following error when trying to run pacchettibotti:

pacchettibotti: SQLite3 returned ErrorBusy while attempting to perform prepare "PRAGMA journal_mode=WAL;": database is locked

Any idea of what I may be doing wrong?

Also it’s going to be hard to automate the import of npm dependencies for various reasons:

f-f commented 4 years ago

@kl0tl I used to get these errors too. It looks like Stack suffered from the same issue, and they fixed it with a simple file lock. I implemented something like that too, and the issue went away for me. Maybe let's move over to an issue on the pacchettibotti repo to try to troubleshoot this further?

Also good point about automating npm/native dependencies. I think taking care of this issue will require upstream fixes -if we want to enforce this we could do it, by having pacchettibotti run tests for packages - in this way packages that require npm dependencies to work will have to take care of properly specifying their deps. A more delicate point is about the fact that today we consider "js" as a single target, but we should really make a distinction between "node" and "browser". I'm not sure how a design for this feature could look like if we'd take care of this here, and I'm not sure if we should have this discussion upstream too (I mean in the compiler repo)

hdgarrood commented 4 years ago

How would it affect the compiler?

f-f commented 4 years ago

@hdgarrood the compiler eventually needs to care about them being different targets, e.g. this is a recent example of how the "js is one target" abstraction leaks

@joneshf thanks for noting. I'd first like to mention that while the Haskell implementation is the most featureful, today there are 4 implementations compatible with the latest standard: Haskell, Rust, Go, Java. I agree on not feeling super good about this (ideally I'd like a native-js binding, that would make it much better IMHO), but we should also be fair about what's the state of the other implementations.

A few questions:

joneshf commented 4 years ago

First, lemme state explicitly that I'm not saying anything bad about any integration in particular. Trying to keep up with a standard as large as Dhall, as fast as the language is evolving, is hard. I recognize this is a big task even for the Haskell implementation with all of its contributors.

If we're being fair to the current state of implementations, now's a good time to look at them because a new standard version went out a week ago. Of the four implementations mentioned above, only the Haskell implementation is currently usable for 16.0.0, and it took most of the week to get there. The Java implementation states that it's fully compliant, but isn't official. The Go implementation hasn't been updated yet. The Rust implementation hasn't been updated yet, and also states that it's not fully compliant. Maybe it seems unfair to use this moment in time to look at the integrations, but this is exactly the kind of situation where things could go sour for a client.

My intent with discussing this concern is to say that if someone banks their registry client on an integration that isn't able to keep up with the standard, it's possible they might not be able to support the registry as it evolves. Since the JSON standard moves at quite a slower pace, banking a registry client on a JSON format lowers the manifest format risk to clients as most languages support the JSON standard well.

I'm not particularly pressed about any of this stuff. Requesting a JSON format is a nice-to-have. But, I don't think we should worry too much about this right now. The risk of not having a JSON format is that the number of registry clients is stunted. Having a derived JSON format is not something that's going to fundamentally change the registry.

  • should we specify which Dhall version we use for the Registry? If yes, where would be the best place in your opinion?

That would make it easier to know what version a client would need to support.

Is there a way to include the version in the manifest without requiring someone to parse the entire Dhall expression? It seems like, if we put it in the Dhall, you'd have to be able to parse the Dhall to know which version you need to support. That's a catch-22 if you can't parse the file because the version it's in is not supported by the integration.

Barring having it in the manifest, having it in the documentation should work out.

  • in my comment above (and in #4) I proposed ways to mitigate the "uncommon file format" problem, do you feel they are not enough, and if yes, do you think we could improve on them?

Sorry, I haven't really responded to your suggestions. Each time I feel like I'm pushing for the non-goal of making a user-facing frontend. I think what you suggested is enough to address the "uncommon file format" problem. If a client isn't willing to meet part-way, that's kind of on the client.

hdgarrood commented 4 years ago

I don't have much to add to that, @joneshf has covered most of my concerns on this very well. There's just one other thing I'd like to add: with a "looser" format like JSON or TOML, it's easier for clients to store their own information in the manifest files. We could, for instance, reserve a particular key for clients which would never contain any registry information; it would be freeform data in which clients could store whatever other information they'd like (such as, for instance, how exactly we want to install native dependencies in this project). When we're using Dhall, it means that whenever a client might want to store information which doesn't have anything to do with the registry specifically in package manifest files, we still need to agree in advance about the type definitions and add the field in the type definitions here. What's potentially more problematic is that if any client wants to change the schema for an existing type definition, migrating is now a problem for the whole registry.

f-f commented 4 years ago

When we're using Dhall, it means that whenever a client might want to store information which doesn't have anything to do with the registry specifically in package manifest files, we still need to agree in advance about the type definitions and add the field in the type definitions here

I always found maddening that in order to configure anything really I have to scout around and search for documentation or look at code, because config languages are usually untyped. So, this push for "let's have an actual type" comes from having experienced this pain, and right here we have a good occasion to minimize pain at scale, which I find quite beautiful.

However, I definitely don't want to add pain by going on the other side of the spectrum and make something "too rigid" - so as you mentioned, we could add an "escape hatch" to the Package type, e.g. some field like

, extraFields :: Map Text Text

..or, once the support for imports as JSON is standardized (it has already been approved):

, extraFields :: JSON

..so that clients could add whatever there.

I feel this might solve your concerns?

I'd not be super happy about this solution though, so let me expand on the above and see if we can get at something more principled: note that clients don't actually have to conform at all to the Package schema - as explained in this section of the spec, the only thing that the registry cares about is having the Package manifest there. This means that:

  1. you could totally have another whatever config file (e.g. pulp.xml) to handle build and stuff, and then just generate the manifest when publishing to the registry. You don't even have to version the registry file in.
  2. the Package schema doesn't have to be tied to Spago (I feel this might be your main concern?). Yes I modeled it on the Spago format, and I would like to keep it compatible (more on this in a sec), but they don't have to be exactly the same schema. By "compatible" I mean that it's possible to write a fromSpago function in Dhall - this is to avoid unnecessary data duplication even though the schemas might be extremely close in shape and content.

What's potentially more problematic is that if any client wants to change the schema for an existing type definition, migrating is now a problem for the whole registry.

We should not be scared of versioning and migrations. Change is totally normal and happens all the time, but we are scared of migrations because most often we have to take care of them in a "jump in the dark" way (or that's my experience at least), and this is because we have poor tools to deal with them. Dhall is an improvement on the average tools for this, even just because you can express migrations as normal functions (plus typechecking, etc etc). I'm assuming here that we stick to a Dhall version for a reasonable amount of time, as @joneshf mentioned above. I guess we might feel the need to nail down a clear plan on how to do it, but let's handle that in #18

hdgarrood commented 4 years ago

For the avoidance of doubt, pulp.xml is extremely not going to happen, at least not unless I hand it off to someone else, lol

I hadn't appreciated the fact that the Package schema doesn't have to be tied to any particular format, and you can just generate it when publishing. That does address my concerns, yes. Thank you!

f-f commented 4 years ago

I thought a bit about this, and I have another idea on how to go at it.

The reasons for using Dhall as registry-storage-format can be boiled down to:

  1. avoid content duplication (thanks to imports and record defaults)
  2. schema verification (thanks to typechecking)

As I mentioned above the one I really care about here is (2). We could potentially let go of (1) if we choose to use e.g. JSON as a data-storage format for the manifests - we'd keep (2) by keeping the Dhall types and matching the converted manifests to them in CI. This likely reduces the amount of controversiality, and makes accessing the manifests more straightforward for several languages. Of course this is a tradeoff, and losing this property means that every package will have to render the manifest here in the registry (since we'd have no imports), that the data in old format will have to be migrated (since we'd have no defaults), and that it would be harder to see what is being overriden by trustees publishing a new version (because we'd have to duplicate the data)

It might be a worth tradeoff, but I'm still unsure if it's a good idea to go JSON-only (for data at rest), as we could provide both representations anyways.

f-f commented 4 years ago

As I mentioned in #76, I tried to implement some of this in there and then backed off, as I came to the same conclusions as @kl0tl's comment above.

So I propose we do not include this in the first version of the Registry, because:

So I'll close this for now, and let's re-evaluate it once we settle/deploy the first version

JordanMartinez commented 4 years ago

I would imagine that this will also be easier to think about when non-JS backends get better support from the compiler.

kl0tl commented 4 years ago

I don‘t want to delay registry with this issue but I‘d like to note that we don’t have to trust package.json files: in https://github.com/spacchetti/pacchettibotti/pull/5 I parsed foreign modules to accurately infer native dependencies.