npm / npm-registry-couchapp

couchapp bits of registry.npmjs.org
https://www.npmjs.org/
Other
616 stars 169 forks source link

No more `npm publish -f` #148

Closed isaacs closed 10 years ago

isaacs commented 10 years ago

In b054e4866c1121a5157b7306ac07fe6d753b1ddc and 516262d4f39949ab1f3d55b609ad3bb30c0c36b4, the ability to publish over a previously published version is removed. This hasn't been published live yet, but it has been tested extensively, and I think it's a Good Thing.

With this change, if you publish foo@1.2.3, you can still un-publish foo@1.2.3. But then, you will not be able to publish something else to that same package identifier. Not now, not never. Deleting documents entirely will be forbidden (though of course old revs still get compacted out) and we'll have a lot more visibility into what each change in the db does.

I wouldn't characterize it as a security fix, per se, but it does reduce a significant gotcha that users have run into over the years, which is especially annoying when deploying Node servers to production. If the bits are gone, ok, it's an error, you notice and deal with it. But if the bits are changed, then this can be really hazardous.

I'm about 85% sure that this is the right approach. It enforces a best-practice that is pretty much universally agreed upon. But, since it is reducing functionality in a way, I think it'd be good for people to speak up if they have a use case that'll be impacted.

ded commented 10 years ago

:+1: right on right on. it should have been this way from the start.

davglass commented 10 years ago

+10000000

davemo commented 10 years ago

\o/ thank you!

robashton commented 10 years ago

Oh god, about time.

benjamn commented 10 years ago

I've used npm publish -f before when a network connectivity problem or NPM error corrupted the original npm publish. Is there another recommended way to repair snafus of that sort? Not changing the bits, per se, but making sure they're all there, and correct.

jkrems commented 10 years ago

How does this work with completely removing a package from npm? Is the package name burned forever?

isaacs commented 10 years ago

@benjamn My first go-to in this situation is to use the npm cache command to fetch and unpack a module to see if it's valid. npm cache clean foo will remove all foo package entries from the cache. Then npm cache add foo@$version will download stuff into ~/.npm/foo/$version/... and you can inspect at will. (Or it'll error out, or yell at you, or whatever.)

@jkrems Fully-removed packages are whittled down to a state where anyone can publish, but the old version numbers are still kept around as permanent tombstones that can never be reused.

isaacs commented 10 years ago

Note that server admins can still DELETE docs directly, so there's still an escape hatch for when such a thing is needed.

UPDATE 2015: PLEASE SEE COMMENT AT THE BOTTOM OF THIS THREAD.

jkrems commented 10 years ago

:+1: Sounds great!

terinjokes commented 10 years ago

I think @jdalton uses this feature to republish previous versions of Lo-Dash when needed. He might want to add something to this discussion.

danmactough commented 10 years ago

I was literally just about to do this to republish an old version of a module with a different dist-tag. Turned out I wanted to fix the readme, so I bumped the version anyway, but it still seems like a valid reason to publish -f.

isaacs commented 10 years ago

@danmactough You can use the npm tag pkg@$oldversion some-tag to do that without a re-publish. Yeah, the documentation update option is where it's annoying. Maybe it'd be nice to add a client command to just update the readme? Seems like that's a common enough use-case to just bless in a nicer way without forceful re-publishing.

danmactough commented 10 years ago

You can use the npm tag pkg@$oldversion some-tag to do that without a re-publish.

Ah. I thought that was the case and tried it, but I forgot to add some-tag, so it didn't work. Not something I do frequently, and it wasn't worth another thought as I wanted to update the readme anyway (which is a good enough reason for a patch version bump to me).

Maybe it'd be nice to add a client command to just update the readme?

Tough call. I can see an argument that what's done is done -- even if it's only the readme. Wouldn't bother me, though. But, I never, ever look at the readme that's sitting in the node_modules directory; I just assume there's a more up-to-date readme on github.

isaacs commented 10 years ago

Yeah, it's mostly about what goes into the doc itself, since that shows up on the website, can be searched, etc.

danmactough commented 10 years ago

shows up on the website, can be searched

Good points. Still kind of hinky for that shasum to change, though.

isaacs commented 10 years ago

Right, so, I add a new command, npm doc-publish that just reads in your README.md or whatever, and updates that to the root package doc, without changing any tarballs. Actually, that's a really good idea, and easy.

indexzero commented 10 years ago

:+1: Some secondary thoughts around

"Deleting documents entirely will be forbidden"

Assuming this is designed to prevent the naive work-around where a user would unpublish the entire package and then republish all versions (including the one single version they wanted to force push). e.g.:

npm unpublish my-package --force // removes all versions (including latest)
npm publish my-package           // Publishes latest again

This is a good thing as preventing deletes all together is the only way to prevent this behavior (or at least to make it auditable which is really what most package consumers want). There are a couple of side-effects though:

  1. Squatting: Although package name squatting has never really been a serious problem, it will be even harder to transfer ownership now if we can never delete documents.
  2. Accidental publishes: We dealt with this a few times when users would contact us to ensure that their private code was actually removed when it was accidentally published. The existing CouchDB "tombstones" made them uncomfortable enough, having living breathing doc records I imagine would be even more so.
  3. Force publish still possible: Reading the delete update function and the changes to validate_doc_update the registry now makes some effort to ensure that the person publishing the package isn't the same person that unpublished it. Wouldn't the following work though?
npm unpublish pkg --force                  // removes all versions (including latest)
npm publish pkg --userconfig second-user   // Intermediary publish to fool the registry
npm unpublish pkg --force --userconfig second-user // Remove intermediary package
npm publish // Now able to republish all versions however I want

Although (3) feels like a nit, given that developers (like @jesusabdullah) see npm publish -f as the moral equivalent to git push --force (even though it clearly isn't) we have to assume that any work around will be used heavily.

jdalton commented 10 years ago

I think @jdalton uses this feature to republish previous versions of Lo-Dash when needed. He might want to add something to this discussion.

Yap, I've used it before. It was handy to back port patches when things like a Node compat issue cropped up, or an external resource changed under our feet, or I had a non-functional doc/package typo. It was a win for me at least :)

I've shot myself in the foot before, but recovered. It was useful.

@jkrems Fully-removed packages are whittled down to a state where anyone can publish, but the old version numbers are still kept around as permanent tombstones that can never be reused.

Yikes, so devs using a package name, unaware it was previously owned and deleted, may run into issues with their versioning?

grncdr commented 10 years ago

expanding on @benjamn's point a bit, it seems that npm publish should prepare a checksum for the package on the client, and the server should fail the publish if the checksums don't match. That's probably a separate issue though.

danmactough commented 10 years ago

updates that to the root package doc, without changing any tarballs

:+1:

terinjokes commented 10 years ago

Yikes, so devs using a package name, unaware it was previously owned and deleted, may run into issues with their versioning?

Per @indexzero's comment above, someone new coming along would be able to publish any version they want. But that's probably by mistake…

indexzero commented 10 years ago

The more I think about it there doesn't seem to be any good way to preventing force publish without preventing unpublish of entire packages. Any checking we do around users could easily be circumvented with multiple accounts by anyone who care enough to do so.

@isaacs what are your thoughts on removing this API all together? i.e. only unpublish of specific versions is allowed. It makes the name squatting case more prolific, but I'm pretty sure it's the only way to be certain.

isaacs commented 10 years ago

@indexzero

Squatting: Although package name squatting has never really been a serious problem, it will be even harder to transfer ownership now if we can never delete documents.

Yeah, that sucks. Since an admin usually has to step in there anyway, they can just delete the doc entirely.

Accidental publishes: We dealt with this a few times when users would contact us to ensure that their private code was actually removed when it was accidentally published. The existing CouchDB "tombstones" made them uncomfortable enough, having living breathing doc records I imagine would be even more so.

Either way, it's still in the db until a compaction, and previous revs are just as gone as _deleted revs after a compaction, so nothing changes.

Force publish still possible: Reading the delete update function and the changes to validate_doc_update the registry now makes some effort to ensure that the person publishing the package isn't the same person that unpublished it. Wouldn't the following work though?

No, actually, but thanks for making me walk through this. The error message was completely useless there :) Fixed on eff893e21696bcf0d30916cca9bafaafffa0f921

jdalton commented 10 years ago

@isaacs Can you explain a little bit why force publish is being removed (I know you had a summary at the top but I'm wondering if there there was a long history of abuse/misuse to point to) and why it was prioritized over something like getting npm stats back up? I found force publish useful. It wasn't something to abuse but I was glad it existed when I needed it.

indexzero commented 10 years ago

@isaacs Ok I see, very clever. For the other 90% of people who don't understand CouchApps this is how it works:

{
  _id: "winston",
  _rev: "42-somerev",
  name: "winston",
  time: {
    // All the old time stuff
    "unpublished": {
      "name": "indexzero",
      "time": 1392186915505
    }
  }
}
for (var v in oldTime) {
  if (v === "modified" || v === "unpublished") continue
  assert(doc.time[v] === oldTime[v],
    "Cannot replace previously published version: "+v)
}

@isaacs ramp up that support team on social engineering sir because I suspect they're going to get a fair amount of requests for this. Any thoughts on say ... only doing this for packages with more than 10 versions? I know lots of folks who publish to the same test package name and then unpublish. Preventing that is going to get pretty ... frustrating.

yocontra commented 10 years ago

@indexzero You could use a few things to determine if a package shouldn't be able to force push (dependents, number of installs, how long it has existed, is it 1.0+ etc.).

IMO I think removing it completely is the best way to go and documenting this properly is going to be really crucial to prevent frustration. npm publish -f should link to something useful instead of ceasing to exist

isaacs commented 10 years ago

The more I think about it there doesn't seem to be any good way to preventing force publish without preventing unpublish of entire packages. Any checking we do around users could easily be circumvented with multiple accounts by anyone who care enough to do so.

I don't think that's the case. But I'd love a patch that adds a test. Try checking out this repository and running whatever tests against it that you'd like.

I know lots of folks who publish to the same test package name and then unpublish. Preventing that is going to get pretty ... frustrating.

Are any of them on this thread? If not, can you please @-mention them so that they can be a part of this discussion? Thanks. (The npm tests themselves do this, but clearly I'm ok with this, so we can refactor those tests.)

Because both the old document and the new document have the same unpublished.time value it is forbidden.

Why is that? doc.time.unpublished.time is never checked. I'm confused?

@isaacs Can you explain a little bit why force publish is being removed and why it was prioritized over something like getting npm stats being restored?

@jdalton force publish changes the bits that are returned for a specific version number. This means that pinning your version is not a reliable way to ensure that you have a specific version of a package. It reduces data integrity, and can lead to difficult-to-resolve conflicts in the database. Many users have asked for this over the years.

This wasn't prioritized "over download stats". I worked a bunch on the npm stats system recently, in fact, and @seldo is researching time-series databases for us to use to efficiently deliver this information more up to date and more on-demand that it was before. The raw data is still all backed up, and we will be providing the counters again soon, and better. Hooray for multiple brains doing multiple things in parallel!

We're not going to do anything where some arbitrary limit means you get to start doing forced publishes. Changed bits in a package with a single version is still changed bits that can break your program in mysterious ways that are hard to detect.

jdalton commented 10 years ago

force publish changes the bits that are returned for a specific version number. This means that pinning your version is not a reliable way to ensure that you have a specific version of a package. It reduces data integrity, and can lead to difficult-to-resolve conflicts in the database. Many users have asked for this over the years.

Yaaa buuut... it was pretty useful, & I'm glad I had it available when it was needed.

This wasn't prioritized "over download stats".

The reason I ask is because there are a few issues open for the download stats and it seems like a more pressing issue than removing working/useful features, like force publish, from npm.

Was there an open issue to remove this feature or was it on some roadmap? This kind of looks like you're dropping it on your users, :confetti_ball:, and using the issue tracker as a way to broadcast the change. Seems a bit backwards (usually goes the other way, an issue is opened, discussed, PRs made, & then merged).

isaacs commented 10 years ago

To explain the checking algorithm:

For old (existing document) and doc (new PUT body)

  1. If doc._deleted, and not an admin then just fail. (No outright DELETE reqs, except by server admins.)
  2. If old.time[v] exists, and doc.time[v] is different from old.time[v], then fail.
  3. If old.versions[v] and doc.versions[v] are (deeply) different, then doc.time[v] must be different from old.time[v]. (Along with previous rule, this means old.time[v] must be unset.)
  4. If you DELETE to registry/_design/app/_update/delete/pkg, then it'll set the write body to something like {..., {"time":{..., "unpublished":{"name":"some-user","time":"2014-02-12T06:51:45.827Z"}}}. The contents of the time.unpublished object are mostly irrelevant. Strictly there so that a human can one day have an extra clue about who deleted it. (Booleans are a waste of information density ;)
  5. Docs with a time.unpublished entry may not have versions.
  6. If you PUT to a doc that has a time.unpublished entry, then you must remove that entry, and also you can do that even if you're not a "maintainer" (since the maintainers list was removed in the unpublish!)
  7. If doc.time.unpublished and old.time.unpublished, then fail (cannot unpublish an already unpublished doc, obscuring the human-aiding clue)
  8. If adding a doc.time.unpublished then the doc.time.unpublished.user must be the currently logged-in user.
  9. If adding a doc.time.unpublished then the currently logged-in user must be in old.maintainers.

There might still be a hole in here somewhere, so if anyone has a suggestion or patch, I'd be happy to review it. AFAICT, this ensures that unpublishes are always accountable, and that versions can never be re-used, causing weird surprises for other users of those modules.

aheckmann commented 10 years ago

Sounds great. I've seen a few module authors abuse republishing in the past. +1

isaacs commented 10 years ago

Was there an open issue to remove this feature or was it on some roadmap? This kind of looks like you're dropping it on your users, :confetti_ball:/, and using the issue tracker as a way to broadcast the change. Seems a bit backwards.

I am using the issue tracker as a way to solicit feedback, yes. It is backwards, but I've found issue trackers to be most useful as two-way information conduits :)

I don't know of an open issue to remove the feature. If there was one, it was probably in npm, not here, but here is where the change needs to be, for security reasons, since the client is just sugar around an HTTP endpoint.

However, I have been literally begged for this, in person, via email, and over twitter, numerous times, from friends and colleagues at several different companies. And on a personal level, as one of the people that people go to when their npm stuff is confusing and weird, I've shaken my own fists at myself for ever allowing publishes over pre-existing versions in the first place.

jdalton commented 10 years ago

I've shaken my own fists at myself for ever allowing publishes over pre-existing versions in the first place.

In cases where force publish might be needed, would a solution be to contact the registry admins for assistance? TBH I'm cool if that's an option.

isaacs commented 10 years ago

In cases where force publish might be needed, would a solution be to contact the registry admins for assistance? TBH I'm cool if that's an option.

Yeah, anyone with the admin bit can still delete whatever they want. If you npm publish something where your social security number is the version, or something, a human can manually intervene. But then, there's an email thread about it, some process, etc. Not just "Whoa, how come that line number in the error is different for you than it is for me?"

jdalton commented 10 years ago

But then, there's an email thread about it, some process, etc.

I can dig it :+1:

glenjamin commented 10 years ago

It might be useful to allow an un-un-publish - the exact same version can be restored.

Even if no UI, I think it makes sense to keep the tarball shasum in the unpublished doc - unless this info exists elsewhere

ljharb commented 10 years ago

What about packages that have, say, zero downloads? I fail to see any value in preventing force pushes to versions that have never been used, and that would cover the "accidental publish" use case pretty well, if it was caught quickly. Are there difficulties I'm unaware of wrt npm mirrors?

medikoo commented 10 years ago

It's great decision.

Still, if it's possible it might be good to allow republish for first 5 minutes after publish, so we can clear obvious mistakes (e.g. inclusion of files that should not be there). At least that was the only use case when I used forced publish feature.

terinjokes commented 10 years ago

My use of force publish is consistent with @medikoo's, I screwed up a publish and quickly resolved it.

jdalton commented 10 years ago

One of my more recent force publish uses was like @ljharb, @medikoo, & @terinjokes but multiplied by ~150 as it was when I first published lodash functions as individual modules.

Raynos commented 10 years ago

For the 5 minute issue that @medikoo @ljharb @jdalton & @terinjokes mentions maybe we should have an npm publish --dry that allows you to verify a publish before running it.

I'm not quite sure what you want to verify before doing it though. maybe a dry run should generate a tarball you can inspect.

I'd imagine its definitely useful for @jdalton case where he wants to script publishing of a 150 modules and needs to test the script.

davglass commented 10 years ago

When I was testing the deployment of hundreds of modules, I just mirrored the registry & published them to my registry to test it with.

I think you can also do an npm pack to generate a tarball.

Sent from Gmail Mobile

jdalton commented 10 years ago

In my case I had typoed the dependency version ranges (they were fixed instead of a range). So I forced published shortly after to correct the issue. Not sure a dry run would help there as it was a doh moment and not an issue with the package contents.

davglass commented 10 years ago

What if --force only worked on the latest version, once and only within a few minutes (like 2)?

We have something similar at Yahoo, but we call it quarantine. A package is published there and verified before promoting it out.

isaacs commented 10 years ago

There are technical complications around having a time based limitation.

Remember, couchdb is a sequence-based rather than time-based updating paradigm. If you do time-based stuff, in a way that the functional semantics of the system depend on (a) the clock time being the same on all servers and/or (b) the clock time when a PUT comes in being near-in-time to the clock time when a PUT was initiated, then that starts to fall apart in systems where you have distributed architectures (including re-playing writes in down-stream replicas, which we do quite a lot of).

So, we won't be doing that.

Only allowing a --force publish of the latest module might be ok, but it's tricky to implement in a secure way. And without the problematic time sensitivity, it's really still the same disruptive problem. What's to stop you from just force publishing over the same module a thousand times?

@jdalton's use case of messing up 150 modules, and then being able to clean them up, is a valid use-case. But it's rare, and probably fine in those cases to reach out to a server admin for help.

@raynos Re npm publish --dry, isn't that exactly npm pack? Your point is a good one, though: more people should know about pack and use it to inspect artifacts locally and test scripts. Write a blog post about it! (Or, you know, nudge me to :)

In the cases where you genuinely have published something you shouldn't've, unpublishing, bumping the version, and publishing a new thing seems like it's basically always the right choice, no?

Raynos commented 10 years ago

In the cases where you genuinely have published something you shouldn't've, unpublishing, bumping the version, and publishing a new thing seems like it's basically always the right choice

That's the best approach. patch versions are cheap ! create lots of them.

davglass commented 10 years ago

Unless you are a library author that has standard releases in several places, like LoDash, jquery, yui, you don't really want npm versions different than the others.

But, if the server admins are responsive to these requests, like @izs usually is, then that's a good option.

What about --force only working once on the latest? That will help the "oh shit" moments & keep them from doing it over & over again.

Sent from Gmail Mobile

isaacs commented 10 years ago

@ljharb Tying it to download counts would be really tricky. A lot of new moving parts to make that work, since downloads aren't tracked in the same place.

@davglass Allowing a single re-publish (or n republishes) of a version would be doable. It complicates the data structure a little bit, but not very much, and doesn't rely on anything external (like download counts or clock drift).

Fundamentally, my goal here was just figure out if there are actually use-cases for re-publishing that cannot for whatever reason be addressed via admin intervention. What I've learned (which isn't surprising, really) is that there ARE use-cases, but they're rare, can be easily addressed by human intervention, and that human intervention is probably a good idea in those cases anyway.

I'm going to go ahead and make the change, and if it leads to a flood of administrative overhead, we can explore a max-republish count, or some other reasonable limit.

aredridel commented 10 years ago

Ow ow ow ow ... okay. Yes, this is the right answer.

davglass commented 10 years ago

I can live with human intervention :)

Thanks for adding this!

Sent from Gmail Mobile

jdalton commented 10 years ago

I'm going to go ahead and make the change, and if it leads to a flood of administrative overhead, we can explore a max-republish count, or some other reasonable limit.

Rock! @phated scared me straight dishing up worst case scenarios. [mental note: I must always lock computer around @phated :grinning:]