rubygems / rfcs

RubyGems + Bundler RFCs
45 stars 40 forks source link

Gem ownership custodians and a process for dealing with abandoned gems. #33

Closed ioquatix closed 2 years ago

ioquatix commented 3 years ago

cc @sonalkr132

ioquatix commented 3 years ago

As part of this, we might also want to consider the inverse case: blocking ownership transfers that look suspicious.

https://mjtsai.com/blog/2018/11/27/popular-npm-package-compromised/

In some cases, authors of popular packages might transfer ownership in a way which produces poor outcomes for the community. We may want to introduce a gem custodian review of gem ownership transfers which could potentially fall into this category.

mensfeld commented 3 years ago

Couple remarks here:

Unresponsive Owners: Not all gem owners provide their contact details, nor is there any guarantee that their contact details will work. This makes it difficult for informal gem ownership trasferral to occur.

Not every owner wants to be reached out to. I've stumbled across this and well, it's their right to do so.

Uncooperative Owners: Unfortunately, even if contactable, some gem owners are uncooperative. This is understandable as the ownership request might come from someone who is unknown to the original owner, and the various issues surrounding gem ownership (including hijacking important gems, etc).

"Forcefull" transfer of ownership is always a high risk procedure. This can be a potential security vulnerability by itself.

The gem custodian can attempt to communicate with the current owner on behalf of the ownership request.

There should be a way to "opt out" from this type of communication either per account of per gem.

A gem is considered "in use" if the gem itself, or any of its direct reverse dependencies (version specific) have more than 10,000 downloads in the past year.

There are semi-private gems that will never reach it. Download count is not the best to assess usage.

A gem is considered "maintained" if the author has logged into https://RubyGems.org within the past 12 months.

Also insufficient. I've seen gems that did not have releases for 7 years but have small updates on Github.

A gem is considered to be "valid" if the homepage and other related metadata is valid (e.g. URLs return relevant pages).

There are literally hundreds of gems without home page, putting rubygems as a home page or with invalid github link (typo, account, etc)

A gem is considered to be "working" if a user can check out and run the unit tests on a supported, non-EOL Ruby implementation.

This would mean we should encourage ppl to add tests into the releases, while for majority of cases it does not make any sense. Also there are many gems that have external deps (DBs, queues, etc) and won't "just run".

A gem is considered to be "stable" if it has a published release >= 1.0.0.

Also not a standard.

A gem is considered "abandoned" if the the most recent release of the gem exceeds the "adoption threshold".

Having so clear definition of that nature means, I could farm this data, create many RG accounts and slowly start taking over many packages to build up a network of packages I could then use.

I would love to come up with a process here, but I feel that we need to have a deeper discussion.

I would start from asking: how often does this happen? I recall two cases for at least a year now. I do agree we need to have policies around this as well as a systematic way but I'm not sure if it should be "popularity" based.

ioquatix commented 3 years ago

I don’t know the number off the top of my head but even the “async” gem was due to private ownership transfer. If it wasn’t for someone’s generosity to give me a name that made sense I might have just given up. A large number of the gems I created and maintain are due to ownership transfer. I feel that because of the lack of name availability, and the lack of transparent process, many developers could give up due to the friction. That’s just my guess though - how would you measure it? The ownership transfer process is not very easy for either party.

I currently have sent out two email messages to gems which have not been updated for many years, to try and reuse them. But after one week did not hear any reply. I don’t think it’s fair to use a shared resource and refuse to have reasonable communications. Even DNS has requirements about contact details which you have to update on a yearly basis.

I agree all your points but that doesn’t mean we don’t need some more transparent process. I think the motivations I provided are real problems. You’d definitely be in a more knowledgeable position to suggest some kind of solution.

Regarding people hosting small gems which are effectively private, is Rubygems, which is a shared community resource, really the right place for it?

I would even argue that taken to its extreme, Rubygems is not sustainable. Over time it seems like it would just get more and more saturated with unmaintained gems. Without any “corrective pressure” the problems that I’ve outlined seem like they will only be worse in the future.

I don’t know what the solution is but I feel a transparent process for dealing with the situations I’ve outlined would be helpful. Maybe having expectations around communication (e.g. sImilar to yearly DNS updates/confirmations) would be another. My goal is to empower people who want to take unmaintained namespace and turn it into maintained namespace.

Ultimately I would like gems which are effectively unmaintained and unused to be open to the community reuse more easily. But maybe this has unacceptable security risks. That being said, maybe we can define a process where the security risk is acceptable and also allows us to avoid these problems, i.e. the bar for any kind of “ownership transfer” should be proportional to the perceived risk, which is what I’ve tried to capture in my proposal.

How often does this happen? I recall two cases for at least a year now

I quickly scanned through the list of gems on rubygems.org that I help maintain (144), for a total of 1.6 billion downloads.

Here are some of the ones which had successful private transfers. For every success there was probably about 1-2x failures (no contact, unwilling to transfer, etc). It's a slow and difficult process.

falcon - private transfer
async - private transfer
db - private transfer
console - private transfer
bake - private transfer
build - private transfer
event - private transfer
docs - private transfer (currently I'm a squatter)
migrate - private transfer
variant - private transfer
live - private transfer
data - private transfer (unusable in Ruby currently)
trace - private transfer
memory - private transfer

I'm not sure if this list is exhaustive since I just eyeballed it.

I would say, I run into this issue several times a year. Probably once every couple of months. I've deliberately tried to choose namespaces which are less likely to generate conflicts so in many cases by my internal naming convention the problem is mitigated.

The list of names above are very nice names (subjectively objective :). Most of them now have actively maintained projects which are valuable to the community at large. Before my effort to do this, they were all unmaintained. Yes, there is value to me personally, but there are others like me who may not have the time or energy to go through this process and we are potentially missing out on their passion and enthusiasm. In essence, the bar to entry has been raised quite a bit because of the volume of unmaintained gems using up namespace. The question is then: "What kind of place do we want Rubygems.org to be?" and there are obviously not one answer - security and reliability is important, but community building and software contribution is also important. Every time I run into this issue, I'm thinking, the people in the late 2000s had a much easier time just choosing whatever name they wanted - it wasn't a big barrier back then as it is now. Maybe I'm technically wrong, but that's how it feels.

Finally, one important point about the above, is that I was transferred multipart-post by the original author. I would like to believe they did their due diligence. But gems with multi-million downloads probably deserve more community scrutiny (i.e. the event-stream problem). Frankly, some of these problems are really hard, but this is where the curator idea comes into play - real humans who are then required to make a real world decision taking into account all the messy edge cases and circumstances. I personally think ANY ownership changes to gems with 1million+ downloads should go through a formal process. I would personally welcome it.

ioquatix commented 3 years ago

Thanks to @nateberkopec who linked me to https://docs.npmjs.com/cli/v6/using-npm/disputes which outlines how NPM solves this problem. Well, they are certainly more direct in their approach.

Fryguy commented 2 years ago

It feels like a lot of the points that were made can be resolved by just having a simple namespace solution. Right now, we are all effectively putting gems into a global namespace. With a namespace, you can use the same name in your own namespace, and then there's not necessarily a need to "transfer" unless there is an official namespace. However, in that case of an official namespace, I'd expect that there is usually some sort of team in place that can take over. Also, with a namespace, you solve the "private gems" problem.

Of course that doesn't solve all of the problems, particularly with dispute resolution, but... baby steps.

Now, how to introduce a namespace is the interesting part. I feel like there could be a transition where the current set of gems are moved into a "global" namespace; gem calls to a non-namespaced version automatically go to the global namespace; perhaps new gems can only go into namespaces; maybe have a way for gem maintainers of global gems to move them into a namespace with a redirect from the global namespace. It will probably take years to fully transition, and then perhaps eventually unmaintained global gems can ultimately move to an "abandoned" namespace.

EDIT: Just noticed this idea is also in https://github.com/rubygems/rfcs/issues/31

sonalkr132 commented 2 years ago

Npm disputes is a good reference. I also want to highlight PEP 541 https://www.python.org/dev/peps/pep-0541

Fryguy commented 2 years ago

EDIT: Just noticed this idea is also in #31

Not to derail this thread but

indirect closed this 6 days ago

😕

bogdanRada commented 2 years ago

I have a little question about this RFC. I would appreciate if anyone could help me.

Is this RFC only for public repositories on Github? I am assuming yes. Also does it apply only to gems hosted on rubygems.org? I am assuming yes. My other question is if i move everything i ever published on rubygems.org to a private gem server (possibly using geminabox) and move also all repositories somewhere else..possibly on Bit Bucket or Gitlab.. Does this RFC still apply? I am hoping it will NOT apply in this case. Or perhaps i need to host everything on a private server?

I don't like several things in this RFC so i am just thinking of an alternative to host my gems and repositories. Or perhaps is there a way to opt-out of this RFC?

Thank you very much.

simi commented 2 years ago

@bogdanRada from my understanding this RFC applies to all gems hosted at rubygems.org. There is no connection in between GitHub and this RFC.

I don't like several things in this RFC so i am just thinking of an alternative to host my gems and repositories. Or perhaps is there a way to opt-out of this RFC?

Can you be more specific? Which part you don't like?

nateberkopec commented 2 years ago

Can this be closed since https://blog.rubygems.org/2022/01/19/rubygems-adoptions.html has been adopted?

sonalkr132 commented 2 years ago

I am not sure we should close this because adoptions was released. Adoptions flow solves the gem transfer/abandoned issue only if the existing owner is responsive. We do need a policy for actions to take if the owner is unresponsive. IMHO, we should update this RFC and title to address unresponsive owners.

ioquatix commented 2 years ago

I think a couple of things could be useful to define based on my experience.

svoop commented 2 years ago

@ioquatix I'd add a third one "Owners who have passed away". While technically covered by "unresponsive", a sudden death (accident etc) doesn't give the owner the opportunity to organize an adoption in time. It's therefore a different enough case to be handled in its own way – e.g. using successor settings on GitHub or other services which will implement similar feature in the future.

ioquatix commented 2 years ago

@svoop it was already covered in the original proposal. :)

mullermp commented 2 years ago

@Fryguy @ioquatix Both of you may be interested in this (mentioned #31), given current discussions here and in other issues. https://github.com/rubygems/rfcs/pull/40

ioquatix commented 2 years ago

@mullermp thanks. How can I help?