Open aoberoi opened 5 years ago
I like some of these ideas! My personal heuristic is last updated, and num downloads. Its a poor standard but it answers two questions: is it being maintained? and do other people trust it?
That being said I think there are two issues here, can we depend on this package for some length of time and can we be sure the package is not introducing vulnerabilities.
My initial thought about your first bullet point is it doesn't actually address either issue and it may even have a side effect of hurting credibility from popular packages by evaluating them based on the amount of issues being identified. Having stale issues is definitely a concern but there is no current way to describe the importance of the issue so you may have a lot of low priority stale issues that come up which would skew results.
I like bullet 2 & 3 though.
I think most problems arise because time is limited and some issues are just deemed low impact. What we really need to know when deciding to depend on a package is can we trust a package to be maintained and for how long, and when it stops being maintained, then how do we go about the changing of the guard.
tl;dr: 1) we need to be careful to avoid negative labeling 2) not following what we deem "best practices" does not mean a package is "unmaintained"
This suggestion is making me think... Is this group really in a position to decide to label some package as "at risk" or "bad shape"? Adding such a label - even if it is in an automated list of thousands of packages - and publicising that in some website is going to hurt feelings. Not only will it hurt feelings, but it will do so incorrectly, unless the "detection tuning" is very very careful.
It is very hard to come up with some metrics which can't be interpreted in multiple ways:
That said, before talking about heuristics, do we need to define what "bad shape" or "at risk" even means? What are these risks (that are in the scope of this group) that we're trying to minimize?
We need a better term that has no negative connotation. How about “highly depended packages” that might need some help?
Best term I can come up with on the spot is "unclear status", but things like that, once you start applying in a specific context, start to grow their own meaning. So in a couple of years people may just start reading "unclear status" as "crap", and we'd still be applying a negative label, even if inadvertant and with good initial intentions. We shouldn't do that.
I'd still like to question the need for this heuristic or labeling or even the category (as a single dimension) itself. What do we want to achieve/prevent?
One of the issues did mention that some packages may break in newer nodes - that's a very clear and unambiguous indicator. It can be coupled with "breakage date detected" and a link to an open/resolved/ignored+closed issue. Sure, one can work around it by having a true
as npm test
, but that's beside the point.
Then there's the security aspect, which is also pretty unambiguous - a package either has or hasn't unresolved security issues.
Anything more than that ("this package might have a security/upgrade problem and that problem might not get resolved if and when it occurs") is unfair and likely offensive to a maintainer no matter how you phrase it?
We shouldn't do that.
Agreed. I think any labels we create will inevitably have issues and second the "we shoulnd't do that" sentiment.
one of the issues did mention that some packages may break in newer nodes
While this is an issue I am not sure doing anything other than providing CITGM for module usage is a good idea. Again, it is a huge ask to try to solve the problem, but it is more reasonable to provide tooling for users to solve the issues on their own.
Then there's the security aspect, which is also pretty unambiguous - a package either has or hasn't unresolved security issues.
This is not "unambiguous". There are many reports which are either false positives or just not applicable. For example, the slug
package had an issue filed against it for a ReDOS vouln. If the end user is using that in a way where untrusted input is passed to it on a web server, it can cause a perf issue. But the tooling also reported that migrate
was vulnerable because it uses slug
. Migrate is a cli tool for managing database migrations, so if you are giving it un-trusted user input you have a whole host of other problems that are nothing to do with a ReDOS report. This is anything but unambigious. https://github.com/tj/node-migrate/pull/77#issuecomment-335911777
My point with all of this is that we should focus on real and attainable goals before we attempt to "label" or "categorize" packages. It is fun when a solution for people problems is also a good recommendation for a software problem :)
Is this something we are still interested in defining? I don't think I've seen discussion around this topic lately, so perhaps we have moved past identifying at risk pacakges and are working on a "I know it when I see it" basis.
I've labeled this stale?, and inviting the group to revisit.
Although more of push model as opposed to pull, the group here did some interviews and does invite maintainers and collaborators to reach out to this group to help elevate the visibility of self identified projects in this space, mainly more so to help connect packages in need of some maintenance to those who may have time. I think given the proximity this group has to some high profile members of the NodeJS / npm / MS ecosystem, having organic outreach to this team, that can then be amplified through social channels would allow this group to help advocate without having to formally label anything?
Maybe it's just a matter of evangelizing / sharing more of what this group can do via social networks / channels? https://github.com/nodejs/package-maintenance#for-maintainers
We've definitely done this kind of outreach triage before and I think is a good use of our collective network to help amplify such requests if we can. (but not really doing more than that.)
A recurring topic I'm seeing in the open issues right now is how we want to make an impact across the ecosystem by identifying packages who are themselves in bad shape or depend (transitively) on packages that are in bad shape.
I believe there is an opportunity for this group to work on a set of heuristics to (either manually or programmatically) identify what it means for a project to be "at risk".
Here's a rather simplistic example of what such a set of heuristics might look like:
Once it can be "tuned" and we feel confident in it, we can begin surfacing the results - projects which are classified as at-risk - in many places. We may be able to work with npm, Inc. to utilize this in the CLI or on the website. We might publish our own website. We might publish guidelines or a tool that application and/or package authors can use to analyze their own dependency tree. We could supply a README badge service. The possibilities are endless, but I think it starts with creating a common definition of what we think at-risk looks like.
Refs:
npm deprecate
as an opt-in way to signal this sort of information isn't working.