Closed lingfennan closed 1 year ago
How define "popular gems"?
I personally analyze all the gems and their versions. But for the community. Maybe something like downloads > 10K or 100K?
The attackers can make the fake download counts by themselves.
The current attacks can be classified into three categories: typosquatting, account hijacking, social engineering (inject malicious code/dependency). For typosquatting, attackers can do whatever they want (which is the case you are talking about). For the latter two, the idea is that attackers are trying to exploit the existing users of those packages. This proposed approach is trying to prevent the latter two categories.
This seems way too difficult a task. The ruby philosophy is to enable people to do things; gems were created to distribute code.
Malicious people will always try to find ways to harm others and benefit from it. I do not think you can use automatic tools to discover all possibilities; it took real people to discover the abusive gems, and none of these have (yet?) affected any large gem either, yet restrictions e. g. such as added against typo-squatting, have affected and hindered real people before (see another issue here at github).
Plus ... I am not even sure how to take it... python tools that should check for malicious code in the rubygem ecosystem? I mean ... come on now ...
What is even meant with "permissions" - does this actually mean restrictions? How do you know which code is "suspicious", either? I mean we have to talk about specifics here. Typosquatting is somewhat simple to detect, more or less; account take over is harder to do something about (it may be useful to gather some statistics about the top 50 gems, within the next 6 months, whether any of them is affected by malicious code or attacks).
As for "social" engineering (nothing "social" about when you use another human being as honeypot to become a passive distributor of malicious code) - there have been prior discussions about ownership transfer etc... and notifying downstream users. If you look at rubygems.org, there are changes such as "pushed by xyz" and so forth, which may be useful if there is more than one author. This can be extended, perhaps use some karma system (although don't take me too literally here; it depends a lot on the design how useful it is. I don't consider the StackOverflow karmasystem to be very good, mostly because others can too easily impede others negatively, e. g. closing questions and so forth, but you get the general idea. I dare claim that malicious people are almost always those who have no proven track record of anything useful in a language community, but making things harder for genuine newcomers is also not a great idea, since you put down hurdles into adoption of a language. There are simple trade offs to be considered here, how open gems should be, the gems ecosystem etc... ).
I have not followed all discusions by the letter, but as far as I have understood it, the rubygems.org folks but also gem + bundler folks are thinking about making things better; and while matz has not commented on any of this (I think, at the least not in english issue trackers), I am quite certain he reads stuff and considers things too. But at the end of the day, there will never be 100% safety as long as trust has to be involved - one has to be suspicious about any code, even non-malicious code that is simply buggy and may do unwanted things.
I agree with Ruian and think this research has merit.
@lingfennan I've reached out to you privately via your gatech.edu e-mail. I would love to collaborate on this. I've made some progress identifying malicious gem candidates via metadata analysis.
Just because something is difficult doesn't mea we should attempt it. Regarding "popular gems" definition, why would an attacker want to make a gem "more popular" by inflating download count, when that would trigger this extra machinery/security scrutiny.
Regarding potential implementations, I feel like this is already kind of implemented by @mensfeld - maybe they can comment. If it's already implemented, maybe this can be closed at least, or we can narrow down the scope to something achievable.
@ioquatix to answer your question: we do this in a couple of ways and constantly work to improve the analysis. We also monitor ownership changes friction (ref https://github.com/rubygems/rfcs/pull/33#issuecomment-915671818) with some other risk management modeling. Will it cover all the cases and find all the issues? I cannot promise but what I can say, is that for the past year more than 60 packages (and many more versions) were yanked due to our reporting.
Security of owners with popular gems was improved by enforcing MFA on their accounts and also rubygems.org is (thanks to @mensfeld) getting reports of potential dangerous gems and actively blocking/yanking them.
Recently, popular libraries, such as
bootstrap-sass
,strong_password
andrest-client
, have been compromised to distribute malware. I notice that existing discussions are mainly talking about improving account security via MFA or notification, and there has been few proposal on improving the security of the ecosystem using code analysis tools. Therefore, I would like to share some ideas and thoughts on this.Among the compromised popular libraries, they used a couple of interesting APIs:
Base64.urlsafe_decode64
,Net::HTTP.get
,Thread.new
andeval
. These APIs can be used to perform suspicious activities such ascode obfuscation
,networking
,process creation
,code generation
. The idea is that we can identify a list of sensitive APIs provided ruby interpreter, assign permissions to them. When new versions of popular gems are pushed, we can identify the permissions required by them and compare against the older versions. If permissions increase, we can take further actions, such as notify other maintainers or dependents etc.The idea can be roughly separated into the following steps:
Please share your comments and ideas! I have done Step 1 & 2 & 4, but the code is mainly written in Python. I would like to share if this will be useful.