pypi / warehouse

The Python Package Index
https://pypi.org
Apache License 2.0
3.58k stars 964 forks source link

Block package names that conflict with core libraries #2151

Closed GadgetSteve closed 7 years ago

GadgetSteve commented 7 years ago

It has been pointed out online, on Hacker Noon, that the current PyPI allows people to register and upload packages with the same names as core python libraries which presents a potential attack vector as pip -U will "upgrade" the core library to the uploaded package, which may be given as a dependency of some other package.

Anybody, with the possible exception of the core python developers, trying to do this should definitely fail with an error message and possibly be flagged as suspicious activity.

I have tried to suggest blocking any upgrades to core packages at pip level, in 4527, but there is a consensus that this is really a problem at the PyPI/Warehouse end.

jonemo commented 7 years ago

What would be the correct/best way to compile the list of standard library modules that should be blocked? I am aware of the standard library module index at https://docs.python.org/3/py-modindex.html However, that only covers the CPython 3.6 standard library. Other Python implementations have additional modules (e.g. IronPython has clr for example). Occasionally, module names change between versions (e.g. xmlrpclib vs xmlrpc and copy_reg vs copyreg from 2.7 to 3.0).

In summary: The first step to dealing with this is to compile an authoritative list of package names.

It seems like the only place where the name of the uploaded package is checked is here. If that's true, the only blocked package names are requirements.txt and rrequirements.txt. Note that I'm very new to this codebase, this is definitely worth double checking.

GadgetSteve commented 7 years ago

On 15/09/2017 06:56, Jonas Neubert wrote:

What would be the correct/best way to compile the list of standard library modules that should be blocked? I am aware of the standard library module index at https://docs.python.org/3/py-modindex.html However, that only covers the CPython 3.6 standard library. Other Python implementations have additional modules (e.g. IronPython has |clr| for example). Occasionally, module names change between versions (e.g. |xmlrpclib| vs |xmlrpc| and |copy_reg| vs |copyreg| from 2.7 to 3.0 https://docs.python.org/3.6/whatsnew/3.0.html#library-changes).

In summary: The first step to dealing with this is to compile an authoritative list of package names.

It seems like the only place where the name of the uploaded package is checked is here https://github.com/pypa/warehouse/blob/e24f0d62a78c9bd8df725164aa121c4ecb4b34b4/warehouse/forklift/legacy.py#L616-L622. If that's true, the only blocked package names are |requirements.txt| and |rrequirements.txt|. Note that I'm very new to this codebase, this is definitely worth double checking.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pypa/warehouse/issues/2151#issuecomment-329689066, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVaWlBidAnRPt_60lT11d5T2JVviX3Tks5sihGpgaJpZM4OHdY7.

The obvious question is what is the criteria for deciding to block a specific package? -- Steve (Gadget) Barnes Any opinions in this message are my personal opinions and do not reflect those of my employer.


This email has been checked for viruses by AVG. http://www.avg.com

jonemo commented 7 years ago

I am quite curious about this issue and would be willing to help move it forward, but after another half hour of background reading, I am not certain whether there is community/maintainer support for this proposal.

A few observations and thoughts (please correct me if I'm wrong):

jonemo commented 7 years ago

Related PR: https://github.com/pypa/warehouse/pull/2396/

dstufft commented 7 years ago

One problem to sort out here is what do we do when a new standard library module is added which already has a namespace collision with an existing project on PyPI what should happen? What about if someone wants to backport a new module to older versions of Python?

jonemo commented 7 years ago

List of Python 3.6 standard library packages as text file: https://gist.github.com/jonemo/57c0eeff88ac5495592d4a4f9d60a96b Script I used to check for existence and author/maintainer of each on PyPI: https://gist.github.com/jonemo/a1c0f4768f2c0aa25e31388c0fd6e377 Output of said script shortly before the timestamp of this comment: https://docs.google.com/spreadsheets/d/15WoAkoaUW1BRSVt9yAOcObHgkWhfQOqUY0_xNbkTwL8/edit?usp=sharing

Stats:

My (relatively uninformed newbie/bystander) suggestion is to:

Possible next steps after this:

GadgetSteve commented 7 years ago

@jonemo Nice report but please note that I don't have a single package registered in my name on PyPI the above sounds like I have 13 the registration of those 13 names was performed by @stestagg another Steve I know who did specifically state in https://github.com/pypa/pypi-legacy/issues/585 that "As the owner of these packages, I don't mind them being taken off me, or access to them disabled as part of any fix." I did raise an enhancement proposal to build filtering into pip https://github.com/pypa/pip/issues/4527 but that was felt not to be worth pursuing at the pip end as it was not treating the root cause and would not address any other package installer hence this ticket.

ewdurbin commented 7 years ago

https://pypi.org/project/stdlib-list is maintained and appears to be kept up to date. looks like it could be helpful, thanks to @jackmaney

ewdurbin commented 7 years ago

with #2409 shipped here's what I see as remaining items to wrap this up:

Anything else?

I think that

Also block obvious cases of "type-squatting" (either manually or automatically via string-similarity metric) to avoid the problem described here

Is another issue as that will be more difficult problem to get right.

ewdurbin commented 7 years ago

2410 addresses messaging/documentation

jackmaney commented 7 years ago

Thank you for using my library (stdlib-list)! I update it after every minor version release (ie the next one will be 3.7). Please let me know if you find something that's missing in any of the lists.

floer32 commented 7 years ago

Regarding this point

[Blocking obvious cases of typo-squatting] Is another issue as that will be more difficult problem to get right.

I understand this hesitation, but -- Perfect is the enemy of good, no? Seems like it could be gotten right enough for the top N most popular downloads. If there is a possibility of going down this path, I would be glad to enlist to help.

jonemo commented 7 years ago

Now that new uploads of stdlib-shadowing names are no longer possible, can someone with the power to do so please remove the dummy packages that have been placed there by @stestagg? See @GadgetSteve's comment for context and https://github.com/pypa/pypi-legacy/issues/585 for a list of these dummy packages.

@GadgetSteve: Apologies for confusing you with @stestagg, who could have known that one Steve reports an issue previously blogged about by another Steve? 😬

GadgetSteve commented 7 years ago

@jonemo No problem on the confusion - it is not exactly new at work we have, in a different division another with the same first & surname and one in the same office with a surname that sounds similar. @hangtwenty Just to point out that there are 2 types of typo-squatting one is things like duplicate & transposed letters, (e.g.: urlllib or urlilb), and the other, increasingly popular is UTF-8 mimicry, e.g.: a package called аррӏе (actually u"\u0430\u0440\u0440\u04cf\u0435"), could spoof one called apple. One approach to the latter would be to require all packages to be named with 7 bit ASCII or similar but that has obvious limitations and may not be desirable.

ncoghlan commented 7 years ago

@GadgetSteve We do indeed restrict PyPI name registrations to 7-bit ASCII: https://www.python.org/dev/peps/pep-0508/#names

While we don't spell out the reasoning there, the vast array of Unicode confusables is indeed the reason we have that restriction - with ASCII, it's mainly only l1 and O0 that you need to worry about.

As far as the actual typosquatting problem goes, my proposal in https://github.com/pypa/warehouse/issues/2268 is to distribute the review workload by notifying the maintainers of the projects with similar names, rather than always notifying the PyPI admins (since admin time and attention is a very limited resource). The PyPI admins would then only get direct notifications when registered project names are close to ones on the already prohibited list.

GadgetSteve commented 7 years ago

Hopefully such notifications will include a link for requesting registration blocking or audit.

-- Steve (Gadget) Barnes Any opinions in this message are my personal opinions and do not reflect those of my employer.


This email has been checked for viruses by AVG. http://www.avg.com

floer32 commented 7 years ago

This might be obvious to people but for calculating the similarity we could use Levenshtein distance.

Relevant blog post by the way:

stestagg commented 7 years ago

Please only remove my packages if the name blocking is applied to pypi as well as warehouse!

ewdurbin commented 7 years ago

@stestagg blocking of names only occurs on upload of a new package name and all such uploads must now be via warehouse, so we’re good here!

stestagg commented 7 years ago

ok, cool, I wasn't aware that had happened :)

GadgetSteve commented 7 years ago

Very happy with the outcome.

Apologies to @stestagg for not CCing on the original submission of this ticket.

ewdurbin commented 5 years ago

Thanks to a helpful nudge from @brainwane... Audit of existing projects in conflict follows. I was able to quickly assess some modules based on authorship/ownership. Also quickly remove some of them which do not have any files or download links.

abc         deleted
argparse        valid   
ast         needs inspection
asyncio         valid
buildtools      needs inspection    
calendar        needs inspection
cd          needs inspection
chunk           needs inspection
code            deleted
colorpicker     needs inspection    
commands        deleted
compiler        needs inspection    
configparser        valid
contextvars     valid   
csv         valid
ctypes          needs inspection
dataclasses     valid
datetime        valid
device          needs inspection
dis         needs inspection
distutils       needs inspection
dl          needs inspection
email           needs inspection
enum            valid
exceptions      needs inspection    
faulthandler        valid
formatter       needs inspection    
framework       needs inspection
functools       needs inspection
gl          needs inspection
hashlib         needs inspection
hmac            needs inspection
html            needs inspection
html-parser     needs inspection
htmlparser      needs inspection
http            needs inspection
http-client     needs inspection
imp         deleted
importlib       valid   
importlib-resources valid       
io          needs inspection
ipaddress       needs inspection
jpeg            needs inspection
logging         needs inspection
logging-config      needs inspection
mailbox         needs inspection
modulefinder        needs inspection
multiprocessing     valid
nav         needs inspection
new         needs inspection
numbers         needs inspection
parser          needs inspection
pathlib         valid
pipes           deleted
pprint          needs inspection
queue           deleted
readline        needs inspection    
repr            needs inspection
resource        needs inspection
secrets         needs inspection
select          needs inspection
selectors       needs inspection
sets            needs inspection
shelve          needs inspection
signal          needs inspection
ssl         valid
statistics      needs inspection    
test            needs inspection
time            needs inspection
token           needs inspection
trace           needs inspection
turtle          needs inspection
typing          valid
unittest        needs inspection    
uuid            needs inspection
w           needs inspection
wave            needs inspection
wsgiref         valid
xmlrpclib       needs inspection
brainwane commented 5 years ago

Thanks @ewdurbin - so the owners of those packages need to consider the name conflict? OK for me to open a new packaging-problems issue with that list?

ewdurbin commented 5 years ago

@brainwane next step would be for someone to take a closer look at all of those listed above as needs inspection. some will be valid and can/should remain on PyPI. I'm not sure that any action from project owners with existing conflicting names is needed.

mertzjames commented 4 years ago

@brainwane @ewdurbin what's the status of this effort? Do you need any help?

brainwane commented 4 years ago

Heads-up @xmunoz in case you want to skim this issue for background

isidentical commented 3 years ago

I can't say anything for the other packages @ewdurbin listed, but for the AST package it seems pretty inactive and stale.

I don't know the procedure to get that package blocked, but maybe we can kindly ask from its maintainer to transfer their package to some other name first and after block?

ewdurbin commented 3 years ago

@isidentical is the goal for the AST project to be replaced with something from Python Core or just to be blocked from re-upload?

isidentical commented 3 years ago

Be blocked.