purescript / spago

🍝 PureScript package manager and build tool
BSD 3-Clause "New" or "Revised" License
793 stars 132 forks source link

How to consume a package with source files not in `src`? #288

Closed joneshf closed 4 years ago

joneshf commented 5 years ago

In the readme, it mentions that spago is about supporting monorepos. It seems to support producing a monorepo fairly well. However, it's not readily clear how to consume a monorepo. It seems like everything is hardcoded to expect files to exist in the src directory at the root of a git repo. Is there any plan to support files existing at a different directory? Say someone had a repo with the structure laid out in the README:

.
├── app
│   ├── spago.dhall
│   ├── src
│   │   └── Main.purs
│   └── test
│       └── Main.purs
├── lib
│   ├── spago.dhall
│   ├── src
│   │   └── Main.purs
│   └── test
│       └── Main.purs
└── packages.dhall

Is there any way to consume lib/src/Main.purs from outside of the repository? If the files are local to the machine, it seems like it can work. If the files are remote to the machine (like hosted on GitHub), it doesn't seem like we can currently consume it.

Any thoughts?

f-f commented 5 years ago

@joneshf I think it depends on the definition of monorepo that we have: in my experience I have only encountered monorepos as "final consumers" of other packages. You might have the case in which some of the monorepo is also useful as a separate package, so to publish that you have two choices:

  1. split it out in another repo and publish it independently (you can do this today)
  2. improve tooling support to avoiding having to split out pieces of the repo

It sounds like you're looking for (2), which at first sight makes sense to me. Thinking more about it though, it looks like the rest of the ecosystem is not compatible with this setup, e.g.:

I would be wary of supporting another way of fetching packages without coordination with other tooling (because introducing splits, etc etc), so I guess the most usable way to do this would be to use submodules, at the price of not being able to share the top-level packages.dhall (i.e. the package becomes effectively standalone)

joneshf commented 5 years ago

Yep, that's exactly what I'm thinking. I appreciate your position and I have a few thoughts:

I appreciate that you're thinking about these things, and want to make sure things work out okay. But, I'd like to live in the spago-only world. As a user of spago, I'm not really interested in being able to consume packages in a monorepo from bower or pulp. If I need that functionality, then I'll do like you suggested and split out another repo to allow that.

I'd like to be able to build out a monorepo using spago and then allow others to consume it using spago. The way I structure the source code of a PS project should be transparent to those consuming that PS project. If it means I have to go do work to make psc-package/purs/pursuit/etc. not explicitly tied to bower in order for spago to want to support this workflow, I'll do that.

On the subject of consuming monorepos, it is a thing that happens in many other ecosystems: babel, go, nixpkgs, rails, symfony, wai. Some of these ship end products (like gofmt), but they all allow you to consume the components of the monorepo. How that happens depends on the ecosystem. Most of them work by distributing an artifact that's separate from the file structure of the source code. E.g. wai's monorepo works because you can distribute a tarball that encapsulates only the important parts of warp.

In the PS ecosystem, we've so far coupled what a distributed package looks like with what the source code of that package looks like. It's a sensible model, and it's gotten us far, but it's an arbitrary coupling. If spago allows referencing a tarball, zip file, or any other archive on the internet (rather than a git repo), I could work with that as well. Bundling up a bunch of files and deploying them somewhere is worlds easier than maintaining a ton of git repos. Is that a viable approach?

f-f commented 5 years ago

@joneshf thanks for the kind words! 😊

If spago allows referencing a tarball, zip file, or any other archive on the internet (rather than a git repo), I could work with that as well. Bundling up a bunch of files and deploying them somewhere is worlds easier than maintaining a ton of git repos. Is that a viable approach?

In principle this would not be too hard (the "arbitrary coupling" is there for ease of implementation), some observations:

bower supports pluggable resolvers

Oh, TIL. If someone puts together such a resolver then I'd be fine investigating this further

What's the motivation behind the concern for pulp?

It's more like a general feeling about our package distribution being extremely tied to package == git repo. I'm concerned about pulp because it's the "reference implementation" on "how to officially publish stuff in PureScript", so anything that deviates from that has me concerned about breaking the workflow and making a Bower → spago migration harder. Though with the tag you proposed it should be fine. So the only thing left is to find a way to communicate to spago the "source paths" inside the package right? Do you have any proposals about that?

joneshf commented 5 years ago
  • a Dhall union is about perfect, but will require changing the type of the upstream package set

Sorry, I'm not too familiar with the interplay/architecture. Why is that?

  • you'd have to give up on adding your package to the upstream package set, since the current requirement is that "the package is available on $officialPackagesRegistry" (Bower at the moment) and follows

I'm totally fine with that. None of my packages are in the package set anymore anyway. I'm fine living in a spago-only world.

  • ...unless we get a registry of our own for PureScript in which we can do all the things, as discussed in this thread on Discourse, on which I'd love your input

I'm not too interested in being part of that discussion. A package registry is basically something you throw money at: pay for servers/storage, pay someone to build an API around it with authentication/authorization (or do it yourself), and step away. I recognize that it's not that cut and dry, but it's also not orders of magnitude more complex.

I've tried many times in the past to dump money into an official PS account/foundation/whatever in hopes that we could solve problems like this. Each time the response was that money wasn't going to be accepted in any official regard. I think we're wasting useful energy trying to figure out another way around it and I don't think it's fruitful for me to say much more. If someone comes up with a solution, I'll try to support it. But, I don't want to spend a bunch of time on it.

I'm concerned about pulp because it's the "reference implementation" on "how to officially publish stuff in PureScript", so anything that deviates from that has me concerned about breaking the workflow and making a Bower → spago migration harder.

Would you feel differently if the official way to work with PS was moved from pulp to spago? I'm down to be a hype man.

So the only thing left is to find a way to communicate to spago the "source paths" inside the package right? Do you have any proposals about that?

Can we add a source key to packages–akin to #173?

f-f commented 5 years ago
  • a Dhall union is about perfect, but will require changing the type of the upstream package set

Sorry, I'm not too familiar with the interplay/architecture. Why is that?

Quoting from here, this is the handwavy Dhall type of the spago configuration:

-- The basic building block is a Package:
let Package =
  { dependencies : List Text  -- the list of dependencies of the Package
  , repo = Text               -- the address of the git repo the Package is at
  , version = Text            -- git tag
  }

-- The type of `packages.dhall` is a Record from a PackageName to a Package
-- We're kind of stretching Dhall syntax here when defining this, but let's
-- say that its type is something like this:
let PackageSet =
  { console : Package
  , effect : Package
  ...                  -- and so on, for all the packages in the package-set
  }

-- The type of the `spago.dhall` configuration is then the following:
let Config =
  { name : Text               -- the name of our project
  , dependencies : List Text  -- the list of dependencies of our app
  , sources : List Text       -- the list of globs for the paths to always include in the build
  , packages : PackageSet     -- this is the type we just defined above
  }

Right now repo is a Text containing a URL that we parse to decide if it's local or not. This will probably be changing soon but leaving that aside for a moment, if we were to decide to support other kinds of packages than "git repos on the root", we could change from Text to something like < GitRepo : Text | MonorepoView : Text | ... >

Now, as you can see above the PackageSet is a record that has values of type Package, so if we change the type here we'd need to change the upstream too. It's not a big deal and can be done in a backwards compatible way (though we'll take advantage of 1.0 to break this kind of stuff)

A package registry is basically something you throw money at: pay for servers/storage, pay someone to build an API around it with authentication/authorization (or do it yourself), and step away

At some point in the thread I propose to use the nixpkgs model: using a GitHub repo (possibly mirrored) for metadata, and something like S3 (also mirrored) for package uploads storage. This means that:

Would you feel differently if the official way to work with PS was moved from pulp to spago?

We'd stop creating new bower users, but we'd still need to worry about the bower → spago migration for the existing ones 🙂

Can we add a source key to packages–akin to #173?

I think we'd still need some kind of manifest in the root of the repo or in some other default location right? (so that spago would be able to locate a file to read the sources key)

joneshf commented 5 years ago

Oh. Because spago uses psc-package's types? Gotcha.

I think we'd still need some kind of manifest in the root of the repo or in some other default location right? (so that spago would be able to locate a file to read the sources key)

Sorry, I'm not sure I understand what we're talking about. Lemme explain what I'm suggesting more explicitly and you can tell me where I'm going wrong :slightly_smiling_face:.

Let's say I want to move https://github.com/joneshf/purescript-httpure-middleware into https://github.com/joneshf/open-source. There's really only one file in that package. I'd like it to sit at:

.
└── packages
    └── purescript-httpure-middleware
        └── src
            └── HTTPure
                └── Middleware.purs

Now, let's say someone else wants to consume purescript-httpure-middleware. It would be nice if they could add to their packages.dhall:

let additions =
      { httpure-middleware =
          { dependencies =
              [ "purescript-httpure" ]
          , repo =
              "https://github.com/joneshf/open-source.git"
          , sources =
              [ "packages/purescript-httpure-middleware/src/**/*.purs" ]
          , version =
              "d03884217eed3f2d41205ac0a56573b2a1443107"
          }
      }
...

Instead of spago assuming source files would be in src/**/*.purs, it would use what the package defines–in this case packages/purescript-httpure-middleware/src/**/*.purs. From my understanding of how the pieces fit together, it seems like this change would be as unobtrusive as possible while also providing a solution to consuming monorepos.

spago could continue to work with git repos, so the core logic wouldn't change. If you didn't want diverging configurations (one package has sources another does not), migrations could happen with something like:

λ(package : Package) → { sources = [ "src/**/*.purs" ] } ⫽ package

while also not requiring (but still allowing) a change in the upstream.


I think having written it out, what I'm really asking for is some way to override https://github.com/spacchetti/spago/blob/40551e3765ffd07637e7afabd2a169238626ace0/src/Spago/Packages.hs#L87

This isn't even specific to monorepos anymore. We've always assumed people would put PS source code at the top-level in a src directory. It seems like if spago can relax that constraint, it could not only make packaging more robust (because we're no longer making an assumption about where source code lives), but it would also allow monorepos to be consumed.

Does this re-phrasing change any of the stuff we've discussed so far? In other words, if we only think about improving robustness of packaging–and don't think about the fact that it extends spago to support consuming monorepos–would the change above be acceptable?

f-f commented 5 years ago

Oh. Because spago uses psc-package's types?

Not really, but kind of: because it uses types from package-sets, when importing the packages.dhall from there

Instead of spago assuming source files would be in src/**/*.purs, it would use what the package defines–in this case packages/purescript-httpure-middleware/src/**/*.purs. From my understanding of how the pieces fit together, it seems like this change would be as unobtrusive as possible while also providing a solution to consuming monorepos.

Oh right sorry, now it makes sense, thanks for detailing 💯

This would be a neat solution, but I think it'd still require changing the type of the upstream and the migration you proposed to happen there (again, this is not a huge deal, will just require more care) because there's no nicer place to perform that migration:

However I'm still unsure about this, as I have the feeling that it would incentivize a "split" in the ecosystem by changing the relationship "one repo == one package" - even if we can make Bower compatible, psc-package is not and AFAIK will not be patched. So I'd probably feel more comfortable talking about this after Spago matures a bit and gets more usage - I guess "after 1.0" would be a good time to consider this again?

f-f commented 5 years ago

@joneshf a temporary, low-effort solution for now would be to move your packages to the monorepo, but keep the repo there for publishing them. Then you can add a script to the repo that just pulls the monorepo, focuses on the package you need, and copies that to the root of the repo

joneshf commented 5 years ago

Well that's unfortunate, but understandable. I'm not upset about it or anything, but I need to find a solution to my hundreds of repos. I'm mostly lazy, don't want to write a bunch of git-based stuff, but still want people to be able to consume the packages I make. I want to live in a spago-only world, but I don't want to live there by myself. I'm either going to trick spago into working somehow, or use something else.

joneshf commented 5 years ago

psc-package is not and AFAIK will not be patched.

Wait, what does this mean? Does this mean the decision for spago relies on psc-package supporting source files existing in directories other than src as well?

If so, the change to psc-package would be adding a similar sources :: [Text] to PackageInfo, and using that here, here, and here. That seems feasible to implement, and I'm more than willing to submit the PR if it means spago gains the ability to support files not in src.

joneshf commented 5 years ago

The more I think about it, the more I realize that I came in here and asked the wrong question. This was never about monorepos, but always about supporting dependencies where the source files weren't in src. This has been an area of accidental complexity that PureScript had for years that I think has made things harder in every packaging solution we've come up with for no real reason. As mentioned above, it's understandable if you don't want to address this right now (or at all), but this issue is bigger than support for monorepos and I'm going to update the title of this issue to reflect that.

f-f commented 5 years ago

Wait, what does this mean? Does this mean the decision for spago relies on psc-package supporting source files existing in directories other than src as well?

Yes, because I want the ecosystem to reasonably move in sync and avoid breakages/splits/etc (the 0.12 transition was not fun so it's ok if we take extra care in this regard) So until psc-package is officially supported I'll consider it "part of the ecosystem"

Just to make it clear I'd agree with this change, but I'd like agreement from the wider community too so we can sync up if we need to make changes (because it's about changing one of the basic assumptions of packaging in PureScript)

Ping @justinwoo @hdgarrood

hdgarrood commented 5 years ago

This has been an area of accidental complexity that PureScript had for years that I think has made things harder in every packaging solution we've come up with for no real reason

Can you give some examples (outside of the feature being requested in this issue)? I very much disagree with this, actually; I think the more things you have which are configurable, the more things other tools need to worry about when they are consuming packages, so we should only allow configurability if we are certain that it is absolutely necessary. In this case, I think requiring source files to be in src/ drastically simplifies the task of compiling your project together with its dependencies, because if you were allowed to put source files anywhere you wanted, you'd have to read and parse a package manifest file for every single package you depend on to find out what that location was.

My preferred way forward here would be to stop having packages necessarily tied to git repositories, so that you could have one repo containing a bunch of packages which you can publish individually. That probably entails using a real package registry and distributing packages as tarballs or something.

joneshf commented 5 years ago

Thinking a bit more about the underlying problem, there's a parallel decision we made and changed: assuming a package will be in a purescript- repo.

It's another arbitrary decision we made in the past that was also neither free nor intuitive. It made things easier at the time because we had a handful of packages, and we were prefixing all of them with purescript-. So, we continued to write tooling around that decision. When the psc-package manifest was made, there was a decision to drop this convention and make the package name configurable and explicit. It was an easy change, it was a non-breaking change, it didn't disrupt the PS ecosystem. But, it meant that you no longer had to create a repo prefixed with the name purescript- in order to use psc-package–and now spago. psc-package could have kept with the past decision that it would only find files within the purecript--prefixed repo, but it didn't.

That change hasn't stopped anything in the ecosystem from growing, nor has it created a bifurcation. In fact, the vast majority of repos still start with purescript-, and they have to if they want to work with pulp simultaneously. I've used this feature of psc-package in the past, and format-nix uses it now. BTW, format-nix is available on both bower and pursuit.

I understand and respect wanting to be sure that this change doesn't adversely affect the community. I think psc-package proved that allowing a manifest with more data, then subsequently using that data to make a more robust tool doesn't propagate back into the ecosystem in a negative way. The vast majority of people default to following the convention, and the one or two people that need something different can do that as well.

@f-f Given that we have an example of changing a similar convention of the past into an explicit configuration in effect for years without any negative consequences on the ecosystem, does that change your thoughts on consuming packages with files not in src?

Can you give some examples (outside of the feature being requested in this issue)?

I won't give any more examples because the way this request has played out in the past is that I attempt to justify my thoughts, and they're dismissed. I know you can be convinced of other ideas, but I don't want to spend the time convincing you of this one. I'm fine with us having different opinions.

My preferred way forward here would be to stop having packages necessarily tied to git repositories

I'm fine with that as well. Having fleshed out what the actual problem is though, allowing to specify where source files exist sounds way easier than breaking the git convention.

hdgarrood commented 5 years ago

I won't give any more examples because the way this request has played out in the past is that I attempt to justify my thoughts, and they're dismissed. I know you can be convinced of other ideas, but I don't want to spend the time convincing you of this one. I'm fine with us having different opinions.

Having different opinions is of course fine, but I think that adding a feature which would enable the possibility of packages which can only be consumed by spago should ideally come with proper justification.

I'm fine with that as well. Having fleshed out what the actual problem is though, allowing to specify where source files exist sounds way easier than breaking the git convention.

Moving away from git will indeed be a lot of effort, but a proper package registry is very desirable for various reasons, which are discussed in https://discourse.purescript.org/t/blogged-thoughts-on-purescript-package-management/809. I think we need to do it at some point anyway.

justinwoo commented 5 years ago

format-nix uses it now. BTW, format-nix is available on both bower and pursuit.

Just to clarify, I did make sure format-nix can be installed and used via bower. It's registered on bower as purescript-format-nix, and installs into that directory structure and can be built with pulp accordingly:

$ fd format bower_components
bower_components/purescript-format-nix
bower_components/purescript-format-nix/src/FormatNix.js
bower_components/purescript-format-nix/src/FormatNix.purs
bower_components/purescript-prelude/src/Data/NaturalTransformation.purs

https://github.com/justinwoo/test-bower-install-format-nix

joneshf commented 5 years ago

I think that adding a feature which would enable the possibility of packages which can only be consumed by spago should ideally come with proper justification.

Can we not use the same justification as what was used for psc-packages current format? As mentioned above, the psc-package format allows anyone to create a package that only lives in the psc-package/spago world; i.e. bower and pulp cannot consume it. And that possibility has existed for years.

hdgarrood commented 5 years ago

I don't find this example particularly compelling, because for a package like that, all you would need to do is to put the purescript- prefix in the name field in your bower.json file (as @justinwoo mentioned above). The name field in bower.json isn't required to match the github repo name or the name you give to a package in a package set, so it's fine to put the purescript- prefix in bower.json but not anywhere else.

joneshf commented 5 years ago

I'm sorry you're not compelled by it, but it's showing that the ecosystem can be trusted to not break everything. You just outlined how the ecosystem can still keep it together even though psc-package has provided the ability for the ecosystem to bifurcate for years. Nobody using psc-package or spago has to support bower and pulp if they don't want to. But the people that want to take the extra freedom afforded by psc-package and put some glue into their workflow can still support consumption by bower and pulp.

The community can make similar glue if their source files are allowed to be in a different place. One example is to publish from a new repo, as @f-f outlined above. Another example is to tag a commit where the files are moved to where bower and pulp expect them. I'm sure there are even more ways. If people still want to support bower and pulp, they can do that and they can do that with their source files in a different location than src for day to day work. bower doesn't have to change, pulp doesn't have to change, pulp doesn't have to grow a manifest file, pursuit doesn't have to do anything special. None of the rest of the ecosystem needs to be affected by psc-package/spago being able to consume a package with files not in src. If you don't find that compelling, I don't think anything else I can say that will compel you.


Ultimately, the decision comes down to @f-f. I've made my case, discussed how the ecosystem has stayed in sync even with the ability to bifurcate. You've weighed in with your thoughts about pulp. If @f-f wants to do this, cool. I'll help in whatever capacity I can. If not, I'll figure something else out.

joneshf commented 4 years ago

@f-f Given that bower changed the registry to no longer support creating new packages, and confirmed it in a follow-up issue, where do we sit on this issue? The bower/pulp workflow for new packages doesn't work anymore. Does this change in bowers usefulness in the PS ecosystem allow this issue to start moving forward?

f-f commented 4 years ago

@joneshf yes. Since we'll have control over the new registry we can (and I think we should) make this configurable, since we'll be packaging sources in a tarball anyways. We can leave this issue open or close it and move the discussion over to that repo, since the implementation details will have to be ironed out there first

joneshf commented 4 years ago

Sweet! Excited to see what comes of the registry.

Maybe keep this issue open until there's a concrete way forward, if that's alright?

f-f commented 4 years ago

@joneshf there is now an issue tracking this discussion in the registry repo: https://github.com/purescript/registry/issues/16

I would say we should close this issue, as that one addresses the upstream concern and this one will come for free once that's in place - i.e. since the Registry and Spago will use the same schema, once we can publish packages with sources in a different place, then Spago will be able to use them. Makes sense?

joneshf commented 4 years ago

I'm okay with that.

wclr commented 3 years ago

The current title "How to consume a package with source files not in src" was confusing to me, the original "How to consume a monorepo" makes more sense (to my view).

First, because I think that the location of sources inside the package is definitely better to be restricted to make things more standard at the top, and the packages, in general, should be a simple thing as it possible for them to be (which now they are certainly not in many cases).

The second thing, definitely a package should be decoupled from a git repo, but currently, as the ecosystem in most cases relies upon git repo version tags, it would be more difficult to keep things in sync with a monorepo, even if not to consider the "bower compatibility". At the same time, I think that eventually having the ability to pull packages content from git repo subdirectories could be still valuable even along with the existing registry (as not everything that could be consumed should be published in the registry).

So currently If I had a number of separate packages that I would choose to reside in monorepos, I would make a Github Org and published (pushed there) package's code with appropriate version tags (as @f-f already proposed here). This would be definitely a non-standard (and even kinda hacky) solution and would require some additional setup, but it could be quite ok (esp. for someone who would like to publish hundreds of one's packages @joneshf ;-))