rubygems / rfcs

RubyGems + Bundler RFCs
45 stars 40 forks source link

Scoped gems proposal #40

Open mullermp opened 2 years ago

mullermp commented 2 years ago

rendered proposal

Hello RubyGems team, and all those that come across this.

In this PR, I have included a proposal for a feature called "scoped gems". In short, the proposal is to widen the gem naming specification to include a new character @ to group related gems together under a specific organization reserved suffix. The naming pattern follows gem_name@scope. On the first gem push (new record), if the gem is scoped (follows the pattern), the gem's scope will be validated to have been created by a user from an organization that reserved the scope. A scoped gem can be installed and required as normal gems are today.

For example, consider aws-sdk-s3, the S3 gem for AWS. If this gem were scoped, it could be published as s3@aws-sdk (or more generically, <service>@aws-sdk). This gem can only be created by a user in the AWS organization (which has reserved the aws-sdk scope). A user can install this gem with gem install s3@aws-sdk and require it with require 's3@aws-sdk'.

The main benefits of this feature are that organizations can publish their own groups of gems (i.e. multiple organizations can have a "configuration" gem), and organizations are able to reserve gem names (via @scope suffix, similar to a reserved prefix). A developer can be reasonably sure that any new gem such as new-cool-feature@rails is an official Rails gem, or new-s3-service@aws-sdk is an official AWS SDK client, or even socket@ruby to be an official stdlib gem! This reservation system combats "fake" "similarly named" gems that are branded as official that attempt to steal personal information.

Please leave any feedback and I would be happy to amend the approach/design.

indirect commented 2 years ago

I feel like namespaces with the syntax @scope/gemname would be less surprising to users given that is the way NPM implements them. What are your feelings for/against that syntax?

(I also added a link to the rendered doc to the top of the PR description for easy access.)

mullermp commented 2 years ago

The / portion gave issues, especially around gem commands (because it assumes a file path), though I suppose that can be parsed correctly. Aside from that, I actually think we should NOT use the same format as it could be especially confusing when working with both languages (i.e. Rails website).

Edit (5/5) - I had updated the RFC explaining the technical challenges/limitations of using @scope/gemname. I would prefer to use @scope/gemname too but the risk/reward is not desirable in my opinion.

ioquatix commented 2 years ago

If we can't get proper organisations, this might be a good first step.

I'd be a tiny bit concerned this works us into a position where supporting proper organisations is harder.

It's definitely concerning when you create a gem like async and then someone else clobbers your namespace by releasing a gem async-thing-you-care-about. Not sure if this proposal solves that problem?

mullermp commented 2 years ago

That is valid feedback. I would think that a user's scope can be easily transferred to an organization concept in the future assuming the scoped username shares the future organization name (i.e. s3@aws-sdk is scoped to the aws-sdk user today, and then scoped to the aws-sdk organization tomorrow - it doesn't change how that gem is installed/required). I have an open question related to migration, that makes use of an "alias" feature. If we go that route, then to support organizations, I suppose you could even alias a gem like s3@aws-sdk to s3@new-organization-for-aws and not require the changing of code for developers either.

Though what do you mean by "proper" organizations? I assume you mean some Rails modeling that groups a bunch of owners together under some entity. Since gems can have multiple owners (already an "organization" if you will), I thought this solution fit in nicely. This proposal can certainly have a concept of such an organization entity and use the same gem naming pattern.

Also, regarding the async example, I believe that same issue exists today regardless of gem name. My gem foo can have an Async module, and your gem async could also have the same module. I don't think we can realistically solve for that, if I'm understanding your concern.

ioquatix commented 2 years ago

Part of my concern is asserting that "async-foo" is official and "async-foob" is not.

mullermp commented 2 years ago

Ah. I want to be clear that async@foob wouldn't be more or less "official" than async@foo in that example. The scope signals to developers a trusted source/organization. Both foo and foob can be valid users with valid gems that solve use cases.

mensfeld commented 2 years ago

At first glimpse, I like this idea. It does not break too many things and seems relatively simple in implementation. My only worry is, that it won't be compatible with PURL format: https://github.com/package-url/purl-spec convention where the namespace is defined before the name:

scheme:type/namespace/name@version?qualifiers#subpath

though it could be fixed by the routing itself in the rubygems.

mullermp commented 2 years ago

Interesting, I have never heard of PURL. Though looking at npm's example pkg:npm/%40angular/animation@12.3.1 it looks like @ is escaped in the package's scope. So I imagine s3@aws-sdk version 1.2.3 could become pkg:gem/s3%40aws-sdk@1.2.3.

mullermp commented 2 years ago

Summon @simi @hsbt, I'd love to hear your feedback on this!

simi commented 2 years ago

I was thinking about this for long time. My biggest concern is transition time. I was thinking about some kind of backward compatible naming scheme (at least for fallback). For example if we decide to do namespacing using "rails/activerecord" scheme, we can fallback for older RubyGems/Bundler by using "rails--activerecord" (or similar, double dash is used only in few gems we can yank currently according to DB dump).

Fryguy commented 2 years ago

Similar to that concern, I have a concern that people might squat transitional names if that's what we decide on...for example, it would be bad if I could create a rails--activerecord gem now anticipating the real gem moving to the new format

mullermp commented 2 years ago

I think "anticipated" squatting can be mitigated (simi mentioned yanking).

mullermp commented 2 years ago

@simi Are you suggesting scoped gems ALSO reserve Ruby namespaces? I.e. s3@aws-sdk MUST use an AWS::SDK Ruby namespace?

simi commented 2 years ago

Similar to that concern, I have a concern that people might squat transitional names if that's what we decide on...for example, it would be bad if I could create a rails--activerecord gem now anticipating the real gem moving to the new format

We can reject gems with -- at RubyGems.org and add it to RubyGems specification policy as a warning.

@simi Are you suggesting scoped gems ALSO reserve Ruby namespaces? I.e. s3@aws-sdk MUST use an AWS::SDK Ruby namespace?

No, we don't check anything in the code and I think it would be super complex to start doing that.

deivid-rodriguez commented 2 years ago

Hi!

I like this feature proposal, and I think there's one extra benefit that hasn't been mentioned: it makes "soft-forking" easier. Say Rails wants to temporarily fork the "mail" gem, to provide a better experience until mail gem owners can get to addressing some important issues. Right now, the only way is to come up with a new name, and it's not clear how to properly communicate that the fork is only something temporary, and not meant to completely sunset the forked gem. Releasing mail@rails and depending on it temporarily makes this intention more clear I believe.

My main concern is namespace squatting too. I'm not sure I understand the current proposal and how do we prevent it. Can rubygems.org users create "custom scopes" (different from their usernames)? If that's the case, what prevents any random user to create the @rails namespace? If not, then it seems aws-sdk would be already squatted? Maybe there should be a transitional period where new scopes need to be explicitly approved to avoid this?

Regarding old clients, is the scope--name notation meant so that the feature works as is in old clients? I'm not sure we should choose a weird naming scheme just to support old clients. I think Bundler with a Gemfile.lock file would handle this pretty well since it's able to trampoline to the version that created the lockfile.

Personally, my preferred naming scheme is the one suggested by @indirect, although I understand it would require more work due to the ambiguous "/".

I'm not too sure about how to migrate to the new scheme, it seems quite complicated. I guess duplicate pushing would be best, maybe enhancing the clients to ease it, for example, something like gem build s3@aws-sdk --alias aws-sdk-s3 that builds a "duplicated gem" with the proper legacy naming.

mullermp commented 2 years ago

@deivid-rodriguez Thanks for the feedback!

My main concern is namespace squatting too. I'm not sure I understand the current proposal and how do we prevent it. Can rubygems.org users create "custom scopes" (different from their usernames)? If that's the case, what prevents any random user to create the @rails namespace? If not, then it seems aws-sdk would be already squatted? Maybe there should be a transitional period where new scopes need to be explicitly approved to avoid this?

I reserved aws-sdk user immediately prior to posting this RFC :D. I think rails user is also owned by the Rails team. I made the assumption that your username is your scope. I think we can certainly use an "organization" here although it requires more thought/design. An "organization" can simply be a group of users on Rubygems, who have access to 1 or more "scopes". Alternatively, the organization name can be the scope itself. I think ultimately there may need to be some initial enforcement.. some users will go and squat some names but we can root them out - I think we can reliably assume users aren't building new software on most of those published gems.

Personally, my preferred naming scheme is the one suggested by @indirect, although I understand it would require more work due to the ambiguous "/".

Yeah, I can see reasons to want it. I started with this approach first but it introduced some complications. Specifically, we'd have to handle these cases: No such file or directory @ rb_sysopen - @mullermp/hola-0.0.0.gem where the gemspec's name assumes a path. Even when fixed, when it's installed, it may also create another nested directory in your gem install location, and that may or may not play nicely with existing tooling? I went down a rabbit hole and decided it was more effort than it's worth, but I could be wrong.

I'm not too sure about how to migrate to the new scheme, it seems quite complicated. I guess duplicate pushing would be best, maybe enhancing the clients to ease it, for example, something like gem build s3@aws-sdk --alias aws-sdk-s3 that builds a "duplicated gem" with the proper legacy naming.

I'm ok with duplication as it would certainly be the safest option. In practice, as a gem maintainer with 300+ gems, it's not as feasible and probably causes some customer confusion. I think a one-way one-level alias might make sense, but we'd have to handle gem install locations too, perhaps a symlink between s3@aws-sdk (real source) to aws-sdk-s3 (symlink folder). If it's handled by Rubygems via alias, it prevents a lot of duplicative work by maintainers.

hsbt commented 2 years ago

👋 I'm positive to add this feature. But I'm not sure what the best syntax about gem_name@scope same as @indirect

We can choose:

Does anyone summaries scoped namespace feature of other package manager?

ioquatix commented 2 years ago

I think one of these is the most reasonable:

scope/gem_name
@scope/gem_name

the latter seems to be the format used by npm IIRC, but I think the former can be slightly better (what's the reason for/motivation of @ character?)

indirect commented 2 years ago

I think the motivation for the at-sign is to clearly distinguish the scope (@scope) from the package name (gem_name). It also makes it possible to talk about the scope separately from the package with the same name. For example, npm hosts a webpacker package in the @rails scope, named @rails/webpacker. Without the @, it's impossible to tell if rails means the scope or the top-level package rails.

indirect commented 2 years ago

A tricky question about scoped packages: in Node, it is completely fine to have both @rails/mail and mail in a single project. They do not conflict.

In RubyGems, it would be (presumably) impossible to have both @rails/mail and mail in a single project, because they would both define the Mail constant. That means either a lot of weird errors if someone adds both packages, or a lot of extra work in RubyGems and Bundler to prevent that kind of conflict.

Can a dependency on mail be satisfied by @rails/mail? If not, it's probably impossible to have a Gemfile that resolves. If yes, that's even more work that needs to be done inside RubyGems/Bundler.

ioquatix commented 2 years ago

@indirect if I'm understanding the original proposal:

# @rails/mail
module Rails::Mail
# mail
module Mail

Is that correct?

mensfeld commented 2 years ago

Interesting, I have never heard of PURL. Though looking at npm's example pkg:npm/%40angular/animation@12.3.1 it looks like @ is escaped in the package's scope. So I imagine s3@aws-sdk version 1.2.3 could become pkg:gem/s3%40aws-sdk@1.2.3.

yes but my points was, that the notion of "scope" is not part of purl because of namespaces. However I understand the reasoning here and backwards compatibility.

@hsbt

Few registries slowly drift towards purl: https://github.com/package-url/purl-spec

@indirect on top of that, for node you can have two versions of the same package in the same repo.

About the namespaces: would be good to estimate how many gems actually use their correct namespace vs patching other things or using completely different namespaces.

deivid-rodriguez commented 2 years ago

@mullermp I guess you're right, we would need to deal with squatting as just another type of possible abuse, and define clear rules about it.

@indirect I think in the particular case of mail, it would mostly work because most Rails users don't use mail directly, and Rails is also in control of the dependency and how it's used, so Rails would change their dependency on mail to @rails/mail and they could also choose to namespace the Mail in their scoped gem to stay on the safe side and avoid any conflicts with people using the mail gem directly.

Anyways, this small benefit is just something that came to my mind as a potential extra benefit, but it's not even mentioned in the RFC. I think this RFC only proposes a way to allow gem name collision, but does not impose anything on what top level module a given gem should define.

simi commented 2 years ago

Another question is how to resolve dependencies? Should we use full (including namespace) gem identifier everywhere (including Gemfile.lock and gemspec dependencies specification)? I see the problem in compatibility again in here. We need to ensure transition is as smooth as possible.

deivid-rodriguez commented 2 years ago

Yes, I think the new name should be used everywhere, otherwise it's impossible to differentiate differently scoped gems with the same name, no? I don't think there's much that can be done about compatibility, unfortunately, except for going down the route you suggested before: choosing a name scheme that's currently valid, just mostly unused. It would definitely make things smoother, although not so nice. I'm not fully sure which option is best.

To elaborate on what I said before about Bundler being able to "trampoline", the idea is that, say, support for the new naming scheme is added on Bundler 2.4. And someone upgrades to Bundler 2.4.0 to be able to bundle @rails/mail in their Gemfile. Then a Gemfile.lock file will be generated including the new incompatible naming scheme, and the BUNDLED WITH 2.4.0 marker. If someone bundles this Gemfile{.lock} file using Bundler 2.3.0, Bundler 2.3.0 will automatically detect that the application needs Bundler 2.4 and automatically upgrade itself, so things should just work. Unfortunately this feature is only very recent and versions older than 2.3.0 would still fail hard.

simi commented 2 years ago

@deivid-rodriguez my idea was to support both old/new name scheme (at least for some time). At RubyGems.org it could just check the client version (or we can make some additional header to ask for new scheme) and respond with given scheme.

gem 'rails--activerecord'
gem 'rails/activerecord' # PURL compatible if I understand it well (or any new scheme incompatible with old gem naming could be here)

Would work the same for some time. The latter one just will not be compatible with older RubyGems and Bundler.

I can ask about PURL and plans of other packing repo maintainers at next OSSF meeting.

deivid-rodriguez commented 2 years ago

Mmmm... I think we should provide a migration mechanism so that gem authors can opt in to the new naming scheme, while still being compatible with the old naming scheme. So even if the activerecord gem starts using rails/activerecord naming, plain activerecord in Gemfiles should still work (and by work I mean it should also pick up newer releases using the new scheme). Once users are ready (they are using up to date clients) they can start using the new scheme. And once Rails chooses to do so, it can stop providing releases with legacy naming.

I'm not really sure how we can make the above work, but if we could do it, what would be the purpose of rails--activerecord?

jchestershopify commented 2 years ago

One thing I'd like to raise is implementation order of this feature vs adding teams/groups. It will be easier to implement teams and then hand out namespaces for those teams than to hand out namespaces per user later on.

Otherwise there will be a landrush for account names, which are largely unsupervised. Teams will need to start accounts to reserve the namespace. Once they use those accounts we lose the ability to assign responsibility for actions to individuals.

Fryguy commented 2 years ago

@indirect if I'm understanding the original proposal:

# @rails/mail
module Rails::Mail
# mail
module Mail

Is that correct?

If that's true, then this would break the "soft forking" mentioned by @deivid-rodriguez (which happens to also be my personal primary motivation for this scoping idea). If we are expecting scoped gems to also change the internal Ruby modules, then I can't, for example, temporarily fork a transitive dependency to fix something and put it in my personal scope.

simi commented 2 years ago

@Fryguy it is impossible (or at least I can't image how to enforce that) for RubyGems to enforce or limit any kind of module/class structure in the gem codebase. That is not the plan.

mullermp commented 2 years ago

Based on discussions, here is where I'm at:

An open question I have around @scope/gem_name is if we should be making @scope folders in your gem install folder (i.e. ~/.gem/ruby/3.0.2/gems/ or equivalent). I'm not certain what breaks (is this backwards compatible?) and whether that changes how you can load a gem.

If this sounds good, I can work on updating the RFC.

mullermp commented 2 years ago

I've pushed an update to the RFC. The highlights are:

  1. Added Organizations to the design. Organizations are an entity (with a "user" profile) that have a list of members/users, list of owned gems, and a list of reserved gem scopes. I've included a light-weight design around it.
  2. Added clarifications about Ruby namespaces and scoped gems. We will not enforce Ruby namespaces of any kind.
  3. Addressed the migration path problem. I've worked out a neat "shim" style solution for migrating gems in place. We can't migrate gems on RubyGems.org alone, so it's on the publisher to do. The shim requires no code changes from consumers.
  4. I've settled on using gem_name@scope. There are too many legacy concerns and downstream changes required with using @scope/gem_name pattern. This approach is more sane to implement and roll out. I know some people might be upset about this.
  5. Decided not to use a scope parameter on the gem specification, it complicated implementation and usage quite a bit and provided no immediate benefits.
  6. Added an open question about regex validation (courtesy of @alextwoods)

As always, let me know what you all think!

Also - I think this RFC is approaching a stage where I'd like to see larger scale feedback. Is there any objection to blasting this out to Ruby Weekly email series for more 👀 ? At what point do we consider RFCs to be accepted?

mullermp commented 2 years ago

A question for the RubyGems team is, what level of feedback (and approval) is required for this proposal? I think the proposal in its current state is complete (pending other feedback). I'm hoping to get this pushed out to the Ruby Weekly email list for more eyes.

halostatue commented 2 years ago

@mullermp I’m seeing this because of your reach-out, but I’m wondering why gem@scope rather than scope@gem? This would allow for (relatively) immediate namespace aliases for most existing gems. That is, mime-types could be aliases as halostatue@mime-types (this is also true of gem@scope, but gem@scope reads backwards to me).

If such aliases could work, then Rubygems itself could try to load either:

begin
  require full_gem_name # 'halostatue@mime-types', loading lib/halostatue@mime-types.rb
rescue LoadError
  require unscoped_gem_name # 'mime-types', loading lib/mime-types.rb
end
mullermp commented 2 years ago

Thanks @halostatue for the feedback. Other than implementation concerns (regex + validation), what is your general sentiment with organizations and gems having reserved scope names (effectively a gem namespaces)?

I’m wondering why gem@scope rather than scope@gem?

I originally chose this because I tried to use npm's convention of @scope/gem but ran into many difficulties and incompatibilities with the gem ecosystem. I settled with gem@scope because scope acts like a reserved suffix. Your scope will be an organization reserved suffix. So I read this literally as "gem AT organization" (kinda like an email address), rather than "organization AT gem" which seems backwards. That aside, I'm open to whatever the overall community prefers. I do agree that maybe, alphabetically, we'd want organization first, then gem name - but I'd rather use a different delimiter than @.

drsharp commented 2 years ago

Is it weird that the scoped names look like emails, sorta? the @ symbol, especially within a string (foo@bar) is so universally understood as an email designation that it makes me wonder if a different symbol (: maybe?) would be better?

mullermp commented 2 years ago

I didn't think it was weird, but I'd be open to changing it if there's more community sentiment to do so. I read it as "this gem is LOCATED AT scope", kinda like how an email would be a user name AT some website/organization.

halostatue commented 2 years ago

The choice of scope characters is always going to be hard. : is not an allowable character on Windows filenames (meaning that some sort of encoding would be required for the filename), and / is a directory separator with its own complications. As far as I know, @ is allowed on Windows filenames. Other special characters (~, etc.) also have potential problems with shell substitutions on *nixish systems.

With respect to order, given your explanation (which I think would be worth adding to the RFC), I can accept gem@scope over scope@gem. It feels weird and unnatural, but that’s largely because most of the other package scoping mechanisms have started from general and headed down to specific. So Node’s @scope/name approach really isn’t that different than com.scope.Name in Java. (As an aside, in node_modules, @scope/name is represented as nested directories. Not that I think that node_modules is in any way worth emulating.)

Regardless, if gem@scope is the choice made, we’ll get used to it. There is, I think a good argument for it.

It’s worth noting that, at least as of right now, neither Elixir nor Rust support scoping, and I don’t think either one is going to do so. (It could be argued that Go does, but that’s because Go uses git as a packaging mechanism.)

I think that the proposal is a good one, and I would absolutely want to reserve at least the mime-types namespace (core@mime-types, data@mime-types).

cyclotron3k commented 2 years ago

When I download a gem named qnap-download_station, I can usually guess that it's going to provide a module or class called Qnap::DownloadStation. I know it's not guaranteed, but it's a common practice, and it's definitely the recommended naming convention.

Adherence to this naming convention, combined with the unique constraint on gem names helps prevent/reduce namespace collisions.

So if config@foo-company, config@bar-widgets, config@acme all provide a Config class, things are going to get messy very quickly. It now becomes incumbent upon the gem developer to manually check for anyone else using that namespace?

Wouldn't it make more sense to put the owner organisation in the Gem's metadata (a new, dedicated field, not the actual metadata field perhaps?) - treat it as a first-class concept instead of overloading the gem name?

djberg96 commented 2 years ago

I'm late to the conversation, but couldn't we add an organization attribute to Gem::Specification? And then modify gem install to allow an organization attribute?

I'm really not following the purpose of the '@' for scoping, and why the org name wouldn't be enough scoping.

I'm also trying to remember what Perl did for package management. I thought they had some way to scope them already, but my memory is fuzzy.

ioquatix commented 2 years ago

How would I structure:

socketry/async -> async.gemspec -> async@socketry? async@async?
socketry/async-http -> async-http.gemspec -> http@async? async-http@socketry?
socketry/db -> db.gemspec -> db@socketry? db@async?
socketry/db-postgres -> db-postgres.gemspec -> db-postgres@socketry? db-postgres@async? postgres@db?

It would be good to understand what is best practice to see if the proposed model fits real world use cases (or what real world use cases fit the proposed model).

halostatue commented 2 years ago

I believe that the proposal would have you create (to pick one of them) async@socketry.gemspec (although I would prefer socketry@async.gemspec as I do prefer scope@gem to gem@scope, despite the clear explanation provided). I would likely do core@mime-types (mime-types@core) and data@mime-types (mime-types@data) for the mime-types gems…although I’m also not entirely sure that I would stop publishing just mime-types.

@djberg96 I think we should add an organization (or, if you prefer, scope) field to the gemfile, but I think that there is value in making the organization/scope part of the scoping capabilities at the command-line and in Bundler without getting excessively verbose.

I would prefer to do any one of the following

$ gem install @mime-types/core
$ gem install mime-types@core
$ gem install core@mime-types

Over:

$ gem install mime-types --organization mime-types
$ gem install core --organization mime-types

Similarly, I would prefer for Bundler gem 'core@mime-types' over gem 'core', organization: 'mime-types'.

In general, I think that this is a cogent RFC, but it feels incomplete to me, as I think that it needs to have a clearer indication that scoped packages (either personal @halostatue/diff-lcs or organizational @mime-types/core, to use NPM-style notation) are going to end up making the transition into the gemspec and rubygems.org (and other implementations like geminabox) fairly seamlessly.

I’m not sure we should wait for perfect before we act on this, if we act on this. I think that there are several issues here of which the scoping of gem names is only one. This gets into:

In some ways, a lot of this (except name scoping) could be fixed if we had a better way of implementing signed gems (and verifying those signatures), rather than using namespaces and Rubygems.org as pure sources of truth. But, as someone who did sign gems early on and found the process excruciating (and no one verified them anyway), I don’t think that’s going to be a solution to any of these problems.

Not quite sure how to move forward on this. It’s a good RFC. I’m not sure it is good enough, but I also don’t know that waiting for a better RFC is going to give us anything this decade.

cyclotron3k commented 2 years ago

Similarly, I would prefer for Bundler gem 'core@mime-types' over gem 'core', organization: 'mime-types'.

But if organisation was part of the gem metadata instead of the gem name, then there would be no ambiguity in the gem name (due to the uniqueness constraint), and therefore specifying the organisation would be unnecessary, no?

zarqman commented 2 years ago

I really like the idea of adding scopes.

Like many others, I just find that the gem_name@scope naming feels backwards. I think it's because we're very used to the order of less-specific/more-specific. Some examples:

Ruby modules: Module::Class File system directories: parent/child GitHub organizations/scopes: github.com/org/project npm: @scope/package PHP: scope/package Java: com.scope.name

Go back in Rubygems' own history to the original gemcutter and we had username-gemname too.

Consider also that Ruby gems themselves often live in GitHub, where they might be named github.com/scope/gem_name.

Or look at the already discussed example of aws-sdk-s3, which exposes the Aws::S3 namespace. While Aws::S3 doesn't exactly match aws-sdk-s3, it'd be even more unnatural if the naming is reversed to s3@aws or s3@aws-sdk. Likewise for @ioquatix's example that if async-http becomes http@async, that's reversed from the included module Async::HTTP.

For those who care about code aesthetics and are prone to aligning related lines (including myself):

gem 'async/async'  # feels normal
gem 'async/http'
gem 'async/websocket'
# vs
gem     'async@async'  # awkward!
gem      'http@async'
gem 'websocket@async'

I'd strongly prefer to see the Ruby ecosystem choose a path that's consistent with everyone else and go with scope/name. Or scope.name. Or even scope@nameor scope--name. Or anything that puts the scope first.

I'll also suggest that consistency with other languages and ecosystems keeps things more accessible to new Ruby developers. If we invert the common ordering, then it's one more thing that blog posts and books will have to address. It adds mental overhead.

ghost commented 2 years ago

First of all, @zarqman has the most valid point here.

"I've settled on using gem_name@scope"

Why of course, because that's the most intuitive approach, considering most of the gems source code is hosted on a github, which has 'organisation/project' structure.

There is this one thing, called "Principle of least suprise" - ignoring it is rarely a good path to follow.


A developer can be reasonably sure that any new gem such as new-cool-feature@rails is an official Rails gem, or new-s3-service@aws-sdk is an official AWS SDK client, or even socket@ruby to be an official stdlib gem!

  1. Reduce customer confusion for tools and services spanning multiple gems. It is not uncommon for an organization to distribute many packages, such as Rails with the ActiveX packages, AWS with SDK clients, Ruby standard library gems, Azure SDKs, etc. Scoped gems are a good way to signal official packages from organizations.

And that brings us... what, exactly? Take two gems, named "company@lib" and "company-lib". One seems to be coming from some company, maybe even real company, and second one probably comes from a company or made by someone to work with some product from that company. What value does that knowledge brings?..

Having something be made by some company does not guarantee you antyhing. Literally nothing. No guarantees, no warranties, no EULA, it does not prevent malicious code from being there, it does not prevent the code from that company to steal your private data, or delete your files, it brings no liability nor responsibility upon that company. Functionally it does NOTHING.


This reservation system combats "fake" "similarly named" gems that are branded as official that attempt to steal personal information.

  1. Help reduce typo-squatting with malicious code. Scoped gems are also susceptible to typos, however, gems published under a scope are verified to be owned by that organization. <-- this here is repeating the same argument from point 1

And how will that system prevent someone from creating an organisation like, for example, 'atlas.org', 'atlas-org', 'hashicorp-atlas', 'atlas.hashicorp', and so on and so forth? How will a developer know which organisation is authentic and which is fake? Employ a validation moderators team? Who is going to pay the salary? And who is going to build and maintain a legally binding validation process? What about legal consequences if moderator accidentally approves a company which will then turn out to be fake or steal your data? Who is then responsible? Moderator? Company? Rubygems project? If noone is responsible and organisation creation is as free as account creation, how do you expect to protect anything from typesquatting at all?

"verified to be owned" gives only one thing: false sense of security. And nothing more.


  1. An organization allows for a better gem owner permissions model. Gems owned by the same company or organization currently need to manage users for each gem. Users and gem owners can now be part of a single entity, and that entity can be the owner of the gem.

This has zero to do with scoping. Really. All this can be implemented absolutely independently of this PR.


Which means all this PR boils down to "validation", which is somehow supposed to bring in some sense of security, which it can not by the sole definition of it, and requires much deeper changes in the whole process, rather than just simply a change in a naming convention.

It does not protect user from anything. It does not ensure anything. It does not provide anything besides self-deception.

It does not stop me from publishing a github page saying "Hey, this is a new cool official gem from AWS, it does amazing things, just add this to your Gemfile: gem 'aws-sdk-s4', github: 'aws-sdk-s4'" -- and I don't even need to post it to rubygems, you know! After all, who, seriously, get their new gems from Rubygems these days? Nobody, really. It's blog posts, it's ruby-toolbox, it's ruby weekly newsletter. Nobody goes to https://rubygems.org thinking "ooh, what new cool gems are trending today?". Rubygems is merely a file dump for gem files, with search and versions. That's it.

All in all, if anything, this change will bring in more confusion, than good.

djberg96 commented 2 years ago

I agree that this won't prevent typo squatting since, as @andy-tycho says, you could just as well typo-squat an organization, unless we add some sort of auth hook, which is outside the scope of the project.

Where an org/scope attribute would come in handy would be for plugins that could hook into it as they see fit. Any sort of auth could be handled by the plugin author instead of us, and then users could choose to use that plugin or not.

It's not a huge deal to me either way, and it can potentially be handled by metadata of course, but it's more likely to be used IMO if it's a Specification attribute.

Fryguy commented 2 years ago

With respect to gem@scope vs scope@gem, they are both awkward in different ways and I think we are arguing the order without also pointing out that the @ symbol itself is part of the awkwardness. For me, when I "read" the @ symbol, I say "at", which implies the thing following it is a larger container (e.g. location, domain, scope, etc). So gem@scope makes much more sense than scope@gem when only considering the @. That being said the argument made by @zarqman of bigger->smaller being more consistent in Ruby is a really good one...if we go with bigger->smaller, then I think @ is the wrong symbol choice. Unfortunately, I'm not sure what a good symbol choice would be considering that the other expected choices of /, : have downsides as @halostatue mentioned.

halostatue commented 2 years ago

Similarly, I would prefer for Bundler gem 'core@mime-types' over gem 'core', organization: 'mime-types'.

But if organisation was part of the gem metadata instead of the gem name, then there would be no ambiguity in the gem name (due to the uniqueness constraint), and therefore specifying the organisation would be unnecessary, no?

I’m not sure what you’re saying here, @cyclotron3k. I think that there is potential value in having scoping be part of a name to expand the "universe" of available package names. The points raised about how namespacing works in Ruby (e.g., when you use mime-types, you can generally expect that the top-level namespace is MIME, Mime, or MimeTypes (it’s MIME::Types) and that @mime-types/core might offer either MIME::Types::Core or just Core — but I think that this is a social problem to be handled).

While I don’t think that Rubygems should be enforcing any names inside the gems (after all, there can be value in publishing a gem that monkey patches another gem to fix a bug in the original gem that hasn’t been updated…), I think that we (the Ruby community) could consider how scoped gems should expose their namespaces.

I know that I haven’t really been part of any of the previous discussions on this topic (mostly because I haven’t been aware of them), but I think that they’re fascinating. @mullermp, thank you for writing this RFC and getting the discussion going to a wider viewpoint.

Even though this is really a proposed Rubygems RFC, I wonder if this might not be something that should be raised on ruby-core. It’s not part of MRI implementation as such, but Rubygems and Bundler are so deeply part of the overall Ruby development experience at this point, it probably needs additional eyes on it.

qrush commented 2 years ago

Hi all-

I'd like to leave an idea here that perhaps should be a separate RFC: if/when rubygems.org starts to add scoped gems support, it feels like that should be on an annual subscription service that goes directly to supporting the costs of running/maintaining rubygems.org. RubyCentral and many sponsors (see the footer on rubygems.org) covers these costs now, but this feels like a great way for the community to actively buy-in and support the central piece of the ecosystem we rely on, and especially with a feature that is geared towards organizations that have the ability to pay for services they use.

I don't know what the right amount to charge is - maybe a community survey should be held to determine that $ amount. Alternatives to gem scoping would still work fine: dashes in gem names, or a company hosting their own private gem server, so I am hoping that will be seen a reasonable workaround if an organization cannot afford the charge or refuses to pay.

❤️

zarqman commented 2 years ago

Under Alternative 3, the original proposal hints at "implementation and legacy complications" arising from the use of / as the scope/name separator.

While I like the aesthetic of /, I do think that these complications are substantial and worth outlining more extensively.

Let's use async-http as our example. Further, let's assume that it would become async/http or @async/http. (I will use ~ to indicate the root directory/git repos of the gem itself.)

async-http presently:

  1. Assumes the gemspec is located at ~/async-http.gemspec.

  2. Installs: GEM_HOME/cache/async-http-1.2.3.gem GEM_HOME/gems/async-http-1.2.3/ GEM_HOME/specifications/async-http-1.2.3.gemspec GEM_HOME/extensions/x86_64/3.0.0/async-http-1.2.3/ (if applicable)

  3. Inside Ruby: Adds GEM_HOME/gems/async-http-1.2.3/lib to $:. require 'async-http' looks for GEM_HOME/gems/async-http-1.2.3/lib/async-http.rb by convention.

  4. gem build outputs ~/pkg/async-http-1.2.3.gem

Now, comparing to async/http and creating directories for each scope (comparable to node_modules/):

  1. Should the .gemspec now be stored in ~/http.gemspec or ~/async/http.gemspec? If the latter, gem build would no longer be able to just search for *.gemspec, but would also have to include */*.gemspec.

    How do either of these choices work with Gemfile commands like: gem 'async/http', github: 'async/http' gem 'async/http', path: '/some/arbitrary/path'

    It would seem that :git or :github would need to parse the scope from the .gemspec during install and then deliberately install into gems/scope/gemname-x.y.z. This might create a chicken-and-egg problem of needing to know gem.name before checking out the tree, but needing to check out the tree before reading the .gemspec (I'm unsure how Rubygems handles this now).

    However, for :path, it seems likely that the scope has to be somewhat internally discarded as the express path has already been provided. See (3) below for implications of this.

  2. Now installs: GEM_HOME/cache/async/http-1.2.3.gemspec GEM_HOME/gems/async/http-1.2.3/ GEM_HOME/specifications/async/http-1.2.3.gemspec GEM_HOME/extensions/x86_64/3.0.0/async/http-1.2.3/ (if applicable)

  3. $: now contains GEM_HOME/gems/async/http-1.2.3/lib. This makes require 'async/http' ambiguous. Should it be looking for async/http.rb inside the async/http gem? Or should it be looking for async/http.rb (yes, identical) inside the async gem? What happens when both exist?

    Perhaps require is modified to recognize scopes by prefixing the scope with @ (same reason npm does it, I believe): require 'async/http' doesn't have @ and so filters $: to only look at unscoped paths for ~/lib/async/http.rb. require '@async/http' parses out the @async, filters $: for only GEM_HOME/gems/async/*, then treats it like require 'http' and looks for ~/lib/http.rb.

    But, as noted in (1), use of gem '@async/http', path: '...' won't reliably have scope/ as part of the pathname in $:, so filtering on $: won't work. This suggests that $: itself would need to be reworked, possibly as a hash: {'@async'=>[..], nil=>[..unscoped paths..]}. This creates backward compatibility concerns.

  4. Do gem build and related commands output pkg/async/http-1.2.3.gem now? This seemingly changes how rubygems.org handles uploads, file storage, routes, etc. (as already noted in Alternative 3.)

Alternative 3 also mentions escaping /. I suggest that while this preserves a flat namespace, it makes a mess trying to ensure every tool gets the escaping/unescaping correct.

Further, it still doesn't resolve ambiguity with require 'async/http'. Should the / be escaped or not? If escaped, Ruby looks in $: for http.rb and hopefully finds it at GEM_HOME/gems/async%2Fhttp-1.2.3/lib. If not escaped, Ruby looks for async/http.rb and perhaps finds it in GEM_HOME/gems/async-1.2.3/lib.

That's a lot of change and seems to me like it would require bumping Rubygems to 4.x as it's pointing towards breaking backward compatibility. If it requires changing the behavior of $:, then it's also a major change within Ruby itself.