pypa / packaging-problems

An issue tracker for the problems in packaging
146 stars 33 forks source link

In pypi, it is impossible to reupload a removed file. #74

Open Natim opened 9 years ago

Natim commented 9 years ago
HTTPError: 400 Client Error: This filename has previously been used, you should use a different version.
Natim commented 9 years ago

Also the previous version has been removed and is impossible to find.

daenney commented 9 years ago

It's probably still available in the Fastly caches, which is why you need to use a new filename. The old filename will have been marked as to cache indefinitely so even if you could upload a filename with the same name, if they had already fetched the old version they would never get the new one.

Natim commented 9 years ago

In my case it isn't a problem because it is the exact same file.

hickford commented 9 years ago

See Donald's email at http://comments.gmane.org/gmane.comp.python.distutils.devel/22739

I've pushed changes to PyPI where it is no longer possible to reuse a filename and attempting to do it will give an 400 error "This filename has previously been used, you should use a different version."

hickford commented 9 years ago

Npm did the same in 2014. See http://blog.npmjs.org/post/77758351673/no-more-npm-publish-f

While it is annoying to have to bump the version number for typos documentation changes, I believe in the long run, the benefits of greater reliability and data integrity are well worth it.

I presume the justification is the same for PyPI. It's an FAQ, so should probably go in documentation somewhere.

Natim commented 9 years ago

Then we shouldn't allow people to remove their files if they cannot put them back.

Natim commented 9 years ago

I think we should allow to reupload the same removed file

tylerdave commented 9 years ago

There are very good reasons for the current behavior. Authors should be able to delete for any number of reasons (legal, security, etc.) Users of the package should be able to rely on getting the exact same thing every time they install a package of a specific version.

If you delete a package that someone relies on, they know the version is gone and they need to make a change to fix it. If you could delete a package and replace it with something different but with the same version, it can break their program is any number of subtle ways and it would be very hard to determine the cause of the problem.

Allowing this would break the entire version number contract. You may have what seems to be a good reason to replace a version but allowing it is not worth making versions unreliable.

hickford commented 9 years ago

Absolutely.

Natim commented 9 years ago

If you delete a package that someone relies on

You broke their package and you cannot put it back.

Natim commented 9 years ago

If you could delete a package and replace it with something different but with the same version, it can break their program

That's not what I am asking for.

I am asking for putting back the package I removed.

Natim commented 9 years ago

Allowing this would break the entire version number contract.

Allowing to put back the version you removed doesn't break any contracts. + You already have the previous package hash so you can check the version didn't change and that you are really re-uploading the file that you removed.

tylerdave commented 9 years ago

There I agree. If it can be ensured via the hash that only the exact same package is uploaded to the same version then I don't see this being a problem in concept.

hickford commented 9 years ago

So long as the documentation and confirmation makes it clear that unpublishing is permanent, then I think it's reasonable and prudent.

It is generally considered bad behavior to remove versions of a library that others are depending on! Even if a package version is unpublished, that specific name and version combination can never be reused. In order to publish the package again, a new version number must be used.

https://docs.npmjs.com/cli/unpublish

hickford commented 9 years ago

To prevent malicious abuse, perhaps the policy should be strengthened to 'no uploads to old versions' https://github.com/pypa/packaging-problems/issues/75

dstufft commented 9 years ago

Unless you have the physical file laying around still, it's unlikely you're going to have something that matches the same hash. The setup.py sdist command does not have deterministic output, each time you run it even if the code hasn't changed. This also means you can't use setup.py to upload the file, since setup.py will only let you upload a file that it has created in the currently executing command, not an already created file. That doesn't make it impossible to upload a file with the same hash, but it makes it tricky which suggests it's a bad UX to expect authors to have to navigate.

Most likely the eventual solution to this is that "delete" won't actually be a full out absolute deletion, it'll be more like a soft delete where it just acts as if it's deleted without actually deleting it (so it won't show up in the API, won't appear anywhere, etc) but there will be a list of these deleted things when the author logs in and a button that says "Restore" that allows them to restore a file they've previously deleted. Possibly this would have a periodic cleanup where if something was soft deleted for some period of time (a month? 6 months? a year?) we'll go through and clean it up and actually hard delete it then. Perhaps we'd also enable it for authors to trigger an immediate hard delete of something they've soft deleted, but there would be plenty of big warnings that if they press that button there is no recovery possible.

Natim commented 9 years ago

That doesn't make it impossible to upload a file with the same hash, but it makes it tricky which suggests it's a bad UX to expect authors to have to navigate.

With twine it is as simple as:

twine upload cliquet-2.5.0-py2.py3-none.whl
dstufft commented 9 years ago

Right, I wrote twine, but not everyone uses that so you have to explain to them that they have to use twine to be able to reupload not setup.py upload. In addition you have to explain to them they need the exact same file, not one created the same way. It's fiddly and people will get confused.

Natim commented 9 years ago

People are not dumb, if they need to do something complicated they will eventually succeed. The fact is even if they know all the things, they won't be able to do it.

But yeah #75 is a workaround for now, (using .zip instead of .tar.gz for instance)

domibarton commented 8 years ago

As I already wrote it in #75

I think that behaviour is quite OK for the live repo.

Though, to be honest, it's a huge PITA for the test repo. I support integrity and all that stuff on "production" systems. However, developers need to have their code / packages checked somewhere and it's a PITA if you can't upload them same version twice while testing a new release.

There's no other way than the test repo to test your package. With git (or any other SCM) you can easily create a new branch and test it until you're sure everything works. Or if you've a look at PHP Packagist (compose) there's a -dev version for each development branch. On Docker the same, you can test your feature/release branches before tagging and going "live".

With the new policy you basically say: You've ONE SINGLE TRY and that one SHOULD WORK. No chance for a 2nd try. IMHO this isn't the purpose of a testing system and breaks the whole "we've a testing repo" idea. To be honest, I think this only leads to annoyed developers and a lot of "crippled versions" because developers couldn't properly test their versions before going live.

tl;dr: I suppose you do that on the live system but not on the test system.

brianmay commented 8 years ago

In my case, I forgot to sign the upload. It appears once you have uploaded the package it is impossible to fix any problems you made with the upload without making a new release. Even if you just want to upload the exact same version again.

daenney commented 8 years ago

But how do you know it is "the exact same version"? Unless it checks the uploads are binary identical it would allow you to upload a totally different release with the same version which can cause any amount of problems.

torarnv commented 8 years ago

Just his what @domibarton is describing. What's the point of a test repo if you can't make mistakes?

Natim commented 8 years ago

Just his what @domibarton is describing. What's the point of a test repo if you can't make mistakes?

Why cannot you do package x.y.z.dev0 and then package x.y.z.dev1?

torarnv commented 8 years ago

I could, and then having to remember to wipe those temp changes from my working tree before pushing to the live pypi repo.

snare commented 8 years ago

I uploaded a new version of Voltron yesterday and the server threw a 500 error during the upload. This resulted in a partial file being hosted as the current wheel for this package. The file size was smaller than my local one, and the hash differed.

This operation needs to be atomic. If the upload fails, you have no opportunity to try again. The only option is to use a different version number, which is not an appropriate solution.

IMO it should be a requirement that the hash of the upload is verified by the author before it is marked as "published".

Natim commented 8 years ago

Yes I have the same problem with my last uploaded packages.

niedakh commented 8 years ago

My files were uploaded broken, due to a connection error, why can't I replace them?!

pstch commented 8 years ago

I also suffer from this issue because of PyPI's recurrent HTTP 500 errors, which leads to incomplete uploads : the client errors out, but the file is created, which then makes it impossible to handle the intermittent upload error by simply retrying.

Having to bump version numbers just because of connection errors is quite annoying. Maybe the upload operation could be made atomic, so that it does not create a file unless everything completes normally (possibly by waiting for user confirmation).

dstufft commented 8 years ago

This might be helpful: https://mail.python.org/pipermail/distutils-sig/2016-June/029083.html

mikofski commented 8 years ago

Just use the manual interface by going to your package pypi site and select files. Change the names of the wheels slightly and add a comment if desired. Then you can upload the replacements for the removed files. This worked fine for me.

brechtm commented 8 years ago

Even after removing the project on TestPyPI, it is not possible to upload the same version.

It's probably a good idea to use devpi instead of (or in addition to) the TestPyPI server.

bittner commented 8 years ago

Will this behavior stay the same forever? It's a bit painful.

I had a syntax error in the long description (so the reStructuredText didn't get converted), and instead of fixing it manually on PyPI I deleted the complete project and started from scratch. Should be the cleanest approach for a new package, shouldn't it?

Instead, now I look like an idiot: There is a version 0.1.0 of my package in the index, but I can't upload an installation package for it ever again. :confused:

BTW, here is the same question/problem/answer on StackOverflow.

Natim commented 8 years ago

@bittner My bet is that you should probably consider this to be like that for ever and find another solution yes. This policy simplifies a lot the package caching and replication.

For the long description you can change it in the pypi admin. It doesn't fix it in the package but at least it fixes your problem.

You can also just create a new 0.1.1 patch release with it fixed.

goanpeca commented 7 years ago

This is silly, I made a mistake in the upload and now I cant correct a mistake?!!?!

Natim commented 7 years ago

You can do a new patch release to correct it. The impact is fairly low.

goanpeca commented 7 years ago

You can do a new patch release to correct it.

I uploaded the dev version (0.1.2dev0) cause I forgot to checkout the tag I was uploading (0.1.1)

But now I have to make a new release in Pypi and conda (yes it uses conda) just because of this restriction? yes it makes no sense...

goanpeca commented 7 years ago

Furthermore this is the first release, no one is using it and I was able to delete the files to fix the problem, but now I cannot upload

Natim commented 7 years ago

You can probably release 0.1.2 cannot you?

Natim commented 7 years ago

This limitation makes it a lot easier to handle the CDN caching policy.

goanpeca commented 7 years ago

Sure, I can do as many releases as I want, but that still does not change the fact this is a pretty annoying behavior :-(

ncoghlan commented 7 years ago

Publishers (or an attacker that has compromised their upload system) silently replacing previously audited software with compromised software is a security vulnerability previously affecting large parts of the internet.

devpi is relatively easy to run locally and allows folks to test their releases before publishing them officially.

dstufft commented 7 years ago

It's a small restriction that makes a lot of things a lot more simple. Not only for end users who don't want the meaning of "foobar 1.0" to suddenly change, but also for the tooling itself to be able to make assumptions that simplify the implementation. It can be annoying for authors, but luckily it's also something that only rarely is an issue and for which the work around (issue a new version number) is not particularly onerous.

atagar commented 7 years ago

If this isn't gonna be fixed then please have the 'remove' button issue a warning that the file cannot be re-uploaded. Like brianmay, I made a release via twine that didn't have a sig. The site doesn't allow you to upload one so I removed the file and tried to re-upload it with the exact same tarball and the signature.

Authors expect this to work. Making removal easy but then making it impossible to re-upload is simply awful.

atagar commented 7 years ago

Going with a version bump because I cannot continue to block my release on this. You cannot legitimately claim this isn't a bug when it's this easily leaves packages in an unfixable state. This isn't to say you need to allow re-use of file names, but there's multiple issues with the site that make this simply broken...

I should be able to create a release, confirm it looks right in PyPI, then publish it. As it is I now need to create a scrap PyPI project so I can safely try twine commands without screwing up my releases.

I sincerely hope the new PyPI site behaves better.

mikofski commented 7 years ago

@atagar see my comment above

Just use the manual interface by going to your package pypi site and select files. Change the names of the wheels slightly and add a comment if desired. Then you can upload the replacements for the removed files. This worked fine for me.

EG: if your file was newpackage-0.X-py2-none-any.whl then change it to newpackage-0.X-CORRECTED-py2-none-any.whl and then upload it manually. I did this for SolarUtils-0.2.2, I added an "a" and it worked fine.

Hope this works for you too!

atagar commented 7 years ago

In the past I did that too, creating a 'Stem-1.4.2b.tar.gz' file to work around this. Couple folks that package my project for distributions (don't recall if it was Debian, Gentoo, Arch, or BSD) contacted me to say this created problems for them.

mikofski commented 7 years ago

I like your idea of a warning button, or maybe an alert pop up on the remove button to confirm and warn about this feature/issue. That should be part of Warehouse - maybe make an issue there?

Also the idea of an optional dry run could also be a Warehouse feature. IE:

$ python setup.py upload sdist --dry-run

then when you go to Warehouse your release would not yet be visible, but would require a confirmation before becoming permanent. Once permanent then it removing it would make it not possible to upload another release of the same version, but the remove button would provide a warning of this first.

merwok commented 7 years ago

There is TestPyPI if you want to do dry-run releases and aren’t running a local devpi server: https://wiki.python.org/moin/TestPyPI

jianli commented 7 years ago

It seems that the current restrictions on re-uploading the exact same version (as onerous as they are) do not actually prevent a malicious user from uploading multiple files with semantically equivalent versions. The malicious user can upload distinct packages for each of

python setup.py sdist upload 1.0
python setup.py sdist upload 1.0.0
python setup.py sdist upload 1.0.0.0

Then, the end-user who has previously verified 1.0 and runs pip install 1.0 would potentially get an unverified file?