Closed isaacs closed 10 years ago
:+1: right on right on. it should have been this way from the start.
+10000000
\o/ thank you!
Oh god, about time.
I've used npm publish -f
before when a network connectivity problem or NPM error corrupted the original npm publish
. Is there another recommended way to repair snafus of that sort? Not changing the bits, per se, but making sure they're all there, and correct.
How does this work with completely removing a package from npm? Is the package name burned forever?
@benjamn My first go-to in this situation is to use the npm cache
command to fetch and unpack a module to see if it's valid. npm cache clean foo
will remove all foo
package entries from the cache. Then npm cache add foo@$version
will download stuff into ~/.npm/foo/$version/...
and you can inspect at will. (Or it'll error out, or yell at you, or whatever.)
@jkrems Fully-removed packages are whittled down to a state where anyone can publish, but the old version numbers are still kept around as permanent tombstones that can never be reused.
Note that server admins can still DELETE
docs directly, so there's still an escape hatch for when such a thing is needed.
UPDATE 2015: PLEASE SEE COMMENT AT THE BOTTOM OF THIS THREAD.
:+1: Sounds great!
I think @jdalton uses this feature to republish previous versions of Lo-Dash when needed. He might want to add something to this discussion.
I was literally just about to do this to republish an old version of a module with a different dist-tag. Turned out I wanted to fix the readme, so I bumped the version anyway, but it still seems like a valid reason to publish -f
.
@danmactough You can use the npm tag pkg@$oldversion some-tag
to do that without a re-publish. Yeah, the documentation update option is where it's annoying. Maybe it'd be nice to add a client command to just update the readme? Seems like that's a common enough use-case to just bless in a nicer way without forceful re-publishing.
You can use the npm tag pkg@$oldversion some-tag to do that without a re-publish.
Ah. I thought that was the case and tried it, but I forgot to add some-tag
, so it didn't work. Not something I do frequently, and it wasn't worth another thought as I wanted to update the readme anyway (which is a good enough reason for a patch version bump to me).
Maybe it'd be nice to add a client command to just update the readme?
Tough call. I can see an argument that what's done is done -- even if it's only the readme. Wouldn't bother me, though. But, I never, ever look at the readme that's sitting in the node_modules
directory; I just assume there's a more up-to-date readme on github.
Yeah, it's mostly about what goes into the doc itself, since that shows up on the website, can be searched, etc.
shows up on the website, can be searched
Good points. Still kind of hinky for that shasum to change, though.
Right, so, I add a new command, npm doc-publish
that just reads in your README.md or whatever, and updates that to the root package doc, without changing any tarballs. Actually, that's a really good idea, and easy.
:+1: Some secondary thoughts around
"Deleting documents entirely will be forbidden"
Assuming this is designed to prevent the naive work-around where a user would unpublish the entire package and then republish all versions (including the one single version they wanted to force push). e.g.:
npm unpublish my-package --force // removes all versions (including latest)
npm publish my-package // Publishes latest again
This is a good thing as preventing deletes all together is the only way to prevent this behavior (or at least to make it auditable which is really what most package consumers want). There are a couple of side-effects though:
validate_doc_update
the registry now makes some effort to ensure that the person publishing the package isn't the same person that unpublished it. Wouldn't the following work though?npm unpublish pkg --force // removes all versions (including latest)
npm publish pkg --userconfig second-user // Intermediary publish to fool the registry
npm unpublish pkg --force --userconfig second-user // Remove intermediary package
npm publish // Now able to republish all versions however I want
Although (3) feels like a nit, given that developers (like @jesusabdullah) see npm publish -f
as the moral equivalent to git push --force
(even though it clearly isn't) we have to assume that any work around will be used heavily.
I think @jdalton uses this feature to republish previous versions of Lo-Dash when needed. He might want to add something to this discussion.
Yap, I've used it before. It was handy to back port patches when things like a Node compat issue cropped up, or an external resource changed under our feet, or I had a non-functional doc/package typo. It was a win for me at least :)
I've shot myself in the foot before, but recovered. It was useful.
@jkrems Fully-removed packages are whittled down to a state where anyone can publish, but the old version numbers are still kept around as permanent tombstones that can never be reused.
Yikes, so devs using a package name, unaware it was previously owned and deleted, may run into issues with their versioning?
expanding on @benjamn's point a bit, it seems that npm publish
should prepare a checksum for the package on the client, and the server should fail the publish if the checksums don't match. That's probably a separate issue though.
updates that to the root package doc, without changing any tarballs
:+1:
Yikes, so devs using a package name, unaware it was previously owned and deleted, may run into issues with their versioning?
Per @indexzero's comment above, someone new coming along would be able to publish any version they want. But that's probably by mistake…
The more I think about it there doesn't seem to be any good way to preventing force publish without preventing unpublish of entire packages. Any checking we do around users could easily be circumvented with multiple accounts by anyone who care enough to do so.
@isaacs what are your thoughts on removing this API all together? i.e. only unpublish of specific versions is allowed. It makes the name squatting case more prolific, but I'm pretty sure it's the only way to be certain.
@indexzero
Squatting: Although package name squatting has never really been a serious problem, it will be even harder to transfer ownership now if we can never delete documents.
Yeah, that sucks. Since an admin usually has to step in there anyway, they can just delete the doc entirely.
Accidental publishes: We dealt with this a few times when users would contact us to ensure that their private code was actually removed when it was accidentally published. The existing CouchDB "tombstones" made them uncomfortable enough, having living breathing doc records I imagine would be even more so.
Either way, it's still in the db until a compaction, and previous revs are just as gone as _deleted revs after a compaction, so nothing changes.
Force publish still possible: Reading the delete update function and the changes to validate_doc_update the registry now makes some effort to ensure that the person publishing the package isn't the same person that unpublished it. Wouldn't the following work though?
No, actually, but thanks for making me walk through this. The error message was completely useless there :) Fixed on eff893e21696bcf0d30916cca9bafaafffa0f921
@isaacs Can you explain a little bit why force publish is being removed (I know you had a summary at the top but I'm wondering if there there was a long history of abuse/misuse to point to) and why it was prioritized over something like getting npm stats back up? I found force publish useful. It wasn't something to abuse but I was glad it existed when I needed it.
@isaacs Ok I see, very clever. For the other 90% of people who don't understand CouchApps this is how it works:
winston
) is "unpublished" it's document becomes:{
_id: "winston",
_rev: "42-somerev",
name: "winston",
time: {
// All the old time stuff
"unpublished": {
"name": "indexzero",
"time": 1392186915505
}
}
}
validate_doc_update
which gets the new document and the old document it has to pass this bit of code:for (var v in oldTime) {
if (v === "modified" || v === "unpublished") continue
assert(doc.time[v] === oldTime[v],
"Cannot replace previously published version: "+v)
}
unpublished.time
value it is forbidden.@isaacs ramp up that support team on social engineering sir because I suspect they're going to get a fair amount of requests for this. Any thoughts on say ... only doing this for packages with more than 10 versions? I know lots of folks who publish to the same test package name and then unpublish. Preventing that is going to get pretty ... frustrating.
@indexzero You could use a few things to determine if a package shouldn't be able to force push (dependents, number of installs, how long it has existed, is it 1.0+ etc.).
IMO I think removing it completely is the best way to go and documenting this properly is going to be really crucial to prevent frustration. npm publish -f
should link to something useful instead of ceasing to exist
The more I think about it there doesn't seem to be any good way to preventing force publish without preventing unpublish of entire packages. Any checking we do around users could easily be circumvented with multiple accounts by anyone who care enough to do so.
I don't think that's the case. But I'd love a patch that adds a test. Try checking out this repository and running whatever tests against it that you'd like.
I know lots of folks who publish to the same test package name and then unpublish. Preventing that is going to get pretty ... frustrating.
Are any of them on this thread? If not, can you please @-mention them so that they can be a part of this discussion? Thanks. (The npm tests themselves do this, but clearly I'm ok with this, so we can refactor those tests.)
Because both the old document and the new document have the same unpublished.time value it is forbidden.
Why is that? doc.time.unpublished.time
is never checked. I'm confused?
@isaacs Can you explain a little bit why force publish is being removed and why it was prioritized over something like getting npm stats being restored?
@jdalton force publish changes the bits that are returned for a specific version number. This means that pinning your version is not a reliable way to ensure that you have a specific version of a package. It reduces data integrity, and can lead to difficult-to-resolve conflicts in the database. Many users have asked for this over the years.
This wasn't prioritized "over download stats". I worked a bunch on the npm stats system recently, in fact, and @seldo is researching time-series databases for us to use to efficiently deliver this information more up to date and more on-demand that it was before. The raw data is still all backed up, and we will be providing the counters again soon, and better. Hooray for multiple brains doing multiple things in parallel!
We're not going to do anything where some arbitrary limit means you get to start doing forced publishes. Changed bits in a package with a single version is still changed bits that can break your program in mysterious ways that are hard to detect.
force publish changes the bits that are returned for a specific version number. This means that pinning your version is not a reliable way to ensure that you have a specific version of a package. It reduces data integrity, and can lead to difficult-to-resolve conflicts in the database. Many users have asked for this over the years.
Yaaa buuut... it was pretty useful, & I'm glad I had it available when it was needed.
This wasn't prioritized "over download stats".
The reason I ask is because there are a few issues open for the download stats and it seems like a more pressing issue than removing working/useful features, like force publish, from npm.
Was there an open issue to remove this feature or was it on some roadmap? This kind of looks like you're dropping it on your users, :confetti_ball:, and using the issue tracker as a way to broadcast the change. Seems a bit backwards (usually goes the other way, an issue is opened, discussed, PRs made, & then merged).
To explain the checking algorithm:
For old
(existing document) and doc
(new PUT body)
doc._deleted
, and not an admin then just fail. (No outright DELETE
reqs, except by server admins.)old.time[v]
exists, and doc.time[v]
is different from old.time[v]
, then fail.old.versions[v]
and doc.versions[v]
are (deeply) different, then doc.time[v]
must be different from old.time[v]
. (Along with previous rule, this means old.time[v]
must be unset.)DELETE
to registry/_design/app/_update/delete/pkg
, then it'll set the write body to something like {..., {"time":{..., "unpublished":{"name":"some-user","time":"2014-02-12T06:51:45.827Z"}}}
. The contents of the time.unpublished
object are mostly irrelevant. Strictly there so that a human can one day have an extra clue about who deleted it. (Booleans are a waste of information density ;)time.unpublished
entry may not have versions.PUT
to a doc that has a time.unpublished
entry, then you must remove that entry, and also you can do that even if you're not a "maintainer" (since the maintainers
list was removed in the unpublish!)doc.time.unpublished
and old.time.unpublished
, then fail (cannot unpublish an already unpublished doc, obscuring the human-aiding clue)doc.time.unpublished
then the doc.time.unpublished.user
must be the currently logged-in user.doc.time.unpublished
then the currently logged-in user must be in old.maintainers
.There might still be a hole in here somewhere, so if anyone has a suggestion or patch, I'd be happy to review it. AFAICT, this ensures that unpublishes are always accountable, and that versions can never be re-used, causing weird surprises for other users of those modules.
Sounds great. I've seen a few module authors abuse republishing in the past. +1
Was there an open issue to remove this feature or was it on some roadmap? This kind of looks like you're dropping it on your users, :confetti_ball:/, and using the issue tracker as a way to broadcast the change. Seems a bit backwards.
I am using the issue tracker as a way to solicit feedback, yes. It is backwards, but I've found issue trackers to be most useful as two-way information conduits :)
I don't know of an open issue to remove the feature. If there was one, it was probably in npm, not here, but here is where the change needs to be, for security reasons, since the client is just sugar around an HTTP endpoint.
However, I have been literally begged for this, in person, via email, and over twitter, numerous times, from friends and colleagues at several different companies. And on a personal level, as one of the people that people go to when their npm stuff is confusing and weird, I've shaken my own fists at myself for ever allowing publishes over pre-existing versions in the first place.
I've shaken my own fists at myself for ever allowing publishes over pre-existing versions in the first place.
In cases where force publish might be needed, would a solution be to contact the registry admins for assistance? TBH I'm cool if that's an option.
In cases where force publish might be needed, would a solution be to contact the registry admins for assistance? TBH I'm cool if that's an option.
Yeah, anyone with the admin bit can still delete whatever they want. If you npm publish
something where your social security number is the version, or something, a human can manually intervene. But then, there's an email thread about it, some process, etc. Not just "Whoa, how come that line number in the error is different for you than it is for me?"
But then, there's an email thread about it, some process, etc.
I can dig it :+1:
It might be useful to allow an un-un-publish - the exact same version can be restored.
Even if no UI, I think it makes sense to keep the tarball shasum in the unpublished doc - unless this info exists elsewhere
What about packages that have, say, zero downloads? I fail to see any value in preventing force pushes to versions that have never been used, and that would cover the "accidental publish" use case pretty well, if it was caught quickly. Are there difficulties I'm unaware of wrt npm mirrors?
It's great decision.
Still, if it's possible it might be good to allow republish for first 5 minutes after publish, so we can clear obvious mistakes (e.g. inclusion of files that should not be there). At least that was the only use case when I used forced publish feature.
My use of force publish is consistent with @medikoo's, I screwed up a publish and quickly resolved it.
One of my more recent force publish uses was like @ljharb, @medikoo, & @terinjokes but multiplied by ~150 as it was when I first published lodash functions as individual modules.
For the 5 minute issue that @medikoo @ljharb @jdalton & @terinjokes mentions maybe we should have an npm publish --dry
that allows you to verify a publish before running it.
I'm not quite sure what you want to verify before doing it though. maybe a dry run should generate a tarball you can inspect.
I'd imagine its definitely useful for @jdalton case where he wants to script publishing of a 150 modules and needs to test the script.
When I was testing the deployment of hundreds of modules, I just mirrored the registry & published them to my registry to test it with.
I think you can also do an npm pack
to generate a tarball.
Sent from Gmail Mobile
In my case I had typoed the dependency version ranges (they were fixed instead of a range). So I forced published shortly after to correct the issue. Not sure a dry run would help there as it was a doh moment and not an issue with the package contents.
What if --force
only worked on the latest version, once and only within a few minutes (like 2)?
We have something similar at Yahoo, but we call it quarantine. A package is published there and verified before promoting it out.
There are technical complications around having a time based limitation.
Remember, couchdb is a sequence-based rather than time-based updating paradigm. If you do time-based stuff, in a way that the functional semantics of the system depend on (a) the clock time being the same on all servers and/or (b) the clock time when a PUT comes in being near-in-time to the clock time when a PUT was initiated, then that starts to fall apart in systems where you have distributed architectures (including re-playing writes in down-stream replicas, which we do quite a lot of).
So, we won't be doing that.
Only allowing a --force
publish of the latest module might be ok, but it's tricky to implement in a secure way. And without the problematic time sensitivity, it's really still the same disruptive problem. What's to stop you from just force publishing over the same module a thousand times?
@jdalton's use case of messing up 150 modules, and then being able to clean them up, is a valid use-case. But it's rare, and probably fine in those cases to reach out to a server admin for help.
@raynos Re npm publish --dry
, isn't that exactly npm pack
? Your point is a good one, though: more people should know about pack
and use it to inspect artifacts locally and test scripts. Write a blog post about it! (Or, you know, nudge me to :)
In the cases where you genuinely have published something you shouldn't've, unpublishing, bumping the version, and publishing a new thing seems like it's basically always the right choice, no?
In the cases where you genuinely have published something you shouldn't've, unpublishing, bumping the version, and publishing a new thing seems like it's basically always the right choice
That's the best approach. patch versions are cheap ! create lots of them.
Unless you are a library author that has standard releases in several places, like LoDash, jquery, yui, you don't really want npm versions different than the others.
But, if the server admins are responsive to these requests, like @izs usually is, then that's a good option.
What about --force only working once on the latest? That will help the "oh shit" moments & keep them from doing it over & over again.
Sent from Gmail Mobile
@ljharb Tying it to download counts would be really tricky. A lot of new moving parts to make that work, since downloads aren't tracked in the same place.
@davglass Allowing a single re-publish (or n
republishes) of a version would be doable. It complicates the data structure a little bit, but not very much, and doesn't rely on anything external (like download counts or clock drift).
Fundamentally, my goal here was just figure out if there are actually use-cases for re-publishing that cannot for whatever reason be addressed via admin intervention. What I've learned (which isn't surprising, really) is that there ARE use-cases, but they're rare, can be easily addressed by human intervention, and that human intervention is probably a good idea in those cases anyway.
I'm going to go ahead and make the change, and if it leads to a flood of administrative overhead, we can explore a max-republish count, or some other reasonable limit.
Ow ow ow ow ... okay. Yes, this is the right answer.
I can live with human intervention :)
Thanks for adding this!
Sent from Gmail Mobile
I'm going to go ahead and make the change, and if it leads to a flood of administrative overhead, we can explore a max-republish count, or some other reasonable limit.
Rock! @phated scared me straight dishing up worst case scenarios. [mental note: I must always lock computer around @phated :grinning:]
In b054e4866c1121a5157b7306ac07fe6d753b1ddc and 516262d4f39949ab1f3d55b609ad3bb30c0c36b4, the ability to publish over a previously published version is removed. This hasn't been published live yet, but it has been tested extensively, and I think it's a Good Thing.
With this change, if you publish
foo@1.2.3
, you can still un-publishfoo@1.2.3
. But then, you will not be able to publish something else to that same package identifier. Not now, not never. Deleting documents entirely will be forbidden (though of course old revs still get compacted out) and we'll have a lot more visibility into what each change in the db does.I wouldn't characterize it as a security fix, per se, but it does reduce a significant gotcha that users have run into over the years, which is especially annoying when deploying Node servers to production. If the bits are gone, ok, it's an error, you notice and deal with it. But if the bits are changed, then this can be really hazardous.
I'm about 85% sure that this is the right approach. It enforces a best-practice that is pretty much universally agreed upon. But, since it is reducing functionality in a way, I think it'd be good for people to speak up if they have a use case that'll be impacted.