purescript / registry-dev

Development work related to the PureScript Registry
https://github.com/purescript/registry
97 stars 80 forks source link

Discussion: Use `user/package` as package identifier #54

Open csicar opened 4 years ago

csicar commented 4 years ago

Change the type of the package identifier from Text to {user : Text, name: Text}, where user is the github username and the name is the package name chosen be that user. For example affjax would be named slamdata/affjax

Advantages

  1. Easier forking of packages, as the new contributor does not need to make up an artificial new name
    • I think this is especially useful when forking packages for users, who don't merge PRs
  2. Less room for arguments regarding who should get what package name
  3. No implicit "official" version in the registry.

Disadvantages

  1. Installation is a little more complicated
  2. Possible confusion over whose package is meant

IDK is package-sets should also carry the prefix. IMO package-sets are opinionated and should probably take a stand in what they consider "official"

csicar commented 4 years ago

Current proposal for how to deal with naming issues: https://github.com/purescript/registry#name-squatting-and-reassigning-names

f-f commented 3 years ago

Note: this is my personal opinion and I'm not speaking for the whole core team.


I am not in favor of namespaced packages. This post has a good summary of where my opinions stand, while detailing the advantages of the current system. It also links to a great deal of discussions about all of this that the Rust ecosystem had to go through (I'll link to some of them throughout this post)

I'll summarize below the reasons why I think our current system should not be changed:

In general there seems to be a misunderstanding about why exactly we'd need this feature - to me the most value of namespaces would be the ability to group packages, to show that a bunch of packages are coming from a known set of maintainers. However, it seems like the main reason for people that bring up the need for namespaces would be the ability to name their package as they wish, even if a package with that name already exists, just by using a different namespace. I feel that this latter goal goes explicitly against what I'd like to see in our package registry: this is because while it's cool that everyone can publish a json package, this fact also allows the proliferation of json packages, meaning that in practice one would need to know which one of the 9847 json packages to use. Should I use f-f/json or purescript-official/json? How is this different from having to pick between f-f-json and purescript-official-json?

As you can see above, if you really care about namespacing your package you can already do it today, using hyphens. Sure it's harder to name packages with a flat registry, and sure it might be annoying to come up with a name if someone already registered the generic name (e.g. json, base64 and so on), but there are advantages in "having to come up with a name". Quoting from here:

When we looked at package ecosystems without namespacing, we found that people tended to go with more creative names (like nokogiri instead of “tenderlove’s libxml2”). These creative names tend to be short and memorable, in part because of the lack of any hierarchy. They make it easier to communicate concisely and unambiguously about packages. They create exciting brands. And we’ve seen the success of several 10,000+ package ecosystems like NPM and RubyGems whose communities are prospering within a single namespace.

On this same topic, it's important to note that if we allow everyone to use the "generic names" for packages, this might enable/incentivize more forks, as it would be easier for folks to publish their "own" version of a package with their patches, instead of trying to get their patches merged into the upstream package with the "generic name". This sounds to me like a clear loss for the ecosystem.

About "arguments on who should pick a squatted package name": the Rust ecosystem is going with a "first come, first serve" policy, with no exceptions. NPM's policy is the same, with an additional "if you want a package that is already taken but not in use, just let us know". The current proposal here is to stick with the latter, but according to NPM folks this might be a labor-intensive policy, so we might go with Rust's policy instead. The bottom line is that as long as there's a clear policy, and as long as we're happy with the amount of work it entails, then there's no need to worry about "arguments": we'd just enforce the policy.

Then there's the issue of "backwards compatibility": even if we introduce namespaces, there's the question "what happens with flat names?". If we decide to take them away, then we have to deal with the obvious issue about "breaking everyone's build". If we decide to keep them then we have the issue of having to distringuish between e.g. package foo-bar and package foo/bar, which is quite hairy, because they look the same, they might eventually normalize to the same name, etc.

I also believe that people should not be able to "reserve namespaces". This is because I believe everyone should have the same rights on the registry. The core team, the org behind halogen, and every other user. By allowing the halogen team to reserve the halogen namespace we'd give them more rights over other users, fencing them off from publishing halogen-related stuff. This is generally a problem in the presence of "package ecosystems": by allowing people to reserve namespaces we'd prevent extensibility for these ecosystems. As someone in the crates.io team put it: why should uploading the package foo give you any rights over that namespace?

If we think it's important to note that certain packages have the same set of maintainers as others, then we could just make that more explicit (e.g. prominently display that in Pursuit), while not trying to overload the "package name" to convey that kind of meaning.

wclr commented 3 years ago

I believe I understand where it comes from, with the current approach to packages, user namespaces rightfully may seem to be unnecessary level of grouping. In a certain sense, it is really so, because with the current package a user gets "full freedom" when publishing one's source modules. A user may even have just one single package-of-his-name and publish there all the modules he/she develops (seems crazy, but it is possible). Or a user may publish multiple packages that contain modules with the same name.

It may look like freedom, but to me, first of all, it looks like a serious abuse of module names and namespaces. Because if we don't impose any constraints on module namespaces published by a user, it leads that modules (and module namespaces) become second-class citizens of the ecosystem, which eventually leads to module name conflicts in codebases.

Why is this happening? Because people don't understand the value of module names, and what they can buy (or give, maybe for free). And there even seem to be some implicit intentions just to dismiss module namespaces, because they potentially conflict with package names as they are, which is quite obvious that there is some implicit conflict.

As I said within the current packaging model user namespaces maybe not so essential or needed. But let's look at the problem of user namespaces in the general case. If you don't have user scopes, but one big scope with all the package names, let's see what you get along with "stability, continuity, and unity":

What you get with user namespace introduced (in terms of relaxing the above problems):

The difference between package names and user namespaces in the discussed sense is that the latter is less important for a user and the ecosystem in general, and much less prone to conflicts. On average a user owns only one namespace of his name which are usually some quite unique thing like paf31, f-f, wclr or something, quite rarely they are just some meaningful or general-use nouns (such rules, btw, could be conveyed to the users).

Significant projects and teams tend to have their own namespaces because it allows managing related packages collectively without unnecessary hassles. For such things, users choose names using different criteria: company name, project name. This is the same as for example for GitHub - users, and organizations, and this realm doesn't seem to have many conflicts, it probably would be an exception compared to what happens with package names.

And even if I would like to register, say, halogen namespace for my project and I found that some other user already took this namespace (which would be really surprising, but strange things happen), and at the same time I really wouldn't want to drop this name I would just register, for example, halogen-project namespace, and I would be quite ok with that.

So there are some definite wins that come with user namespaces, though I agree that there may be some drawbacks, for example, in the form of potential fragmentation of the package directory, though this happens mainly because the system does not impose on users the necessary culture, rules, and constraints.

NB! But, this is a very important point. I'm not advocating the user namespaces in general and in the current registry implementation. The registry that is aligned with the existing understating of packages as namespaces for arbitrary module sets maybe really does not need them. Maybe it is right that you feel that user namespaces are the extra link in the current model (I too would have such concerns).

I started to talk about user namespaces exclusively in terms of the proposal I made, which is generally about making module namespaces the first-class citizens of the ecosystem. Those things (the proposal in general and user namespaces) are mostly not divisible, that's why they come together. Without it, I don't care much about user namespaces, as the current model may exist without them (though the conflicts and discussions in other ecosystems may signal that not everything is ok with it). But the model I propose can barely be sustainable without user namespaces and I tried to explain why quite carefully in the proposal and the discussion.

wclr commented 3 years ago

A couple of notices to the "no change" post statements.

As you can see above, if you really care about namespacing your package you can already do it today, using hyphens.

This all looks like crutches and the absence of a regulated system. One person "thinks" one way, another person believes another way. People should think differently about other things of higher-level related to their domain problems, not about how to name the package with prefix or not. It would be better to remove this burden from the shoulders of the users and give more strict rules of naming, though this is barely possible with the current approach to package naming in general.

On this same topic, it's important to note that if we allow everyone to use the "generic names" for packages, this might enable/incentivize more forks, as it would be easier for folks to publish their "own" version of a package with their patches, instead of trying to get their patches merged into the upstream package with the "generic name". This sounds to me like a clear loss for the ecosystem.

How does this protect one from publishing my-favorite-package-fixed (my-favorite-package-fixed2)? This is just potential polluting of the global namespace. And it is a strange way to "encourage" making fixes to the upstream. Should a user wait for upstream fixes to use it? I believe users should feel responsible for such things, and this feeling should be encouraged using other methods, not some "limitations", which actually do not limit anything.

But I want to emphasize another point, I think a registry should be a place for things to be shared first of all, not just temporal fixes, and again this attitude should be conveyed. That's why client tooling should have first-class support for working with git repos that users could use for example for temporary fixes or for experiments, but not publish to the registry everything that they can.

About "arguments on who should pick a squatted package name": the Rust ecosystem is going with a "first come, first serve" policy, with no exceptions.

This is too about equal rights I believe, but the quickest has more equal rights, obviously in this model. Yet it is possible to create a model where truly "the best" packages (from the ecosystem standpoint) have more privileges, which is obviously seems right.