Closed GadgetSteve closed 7 years ago
What would be the correct/best way to compile the list of standard library modules that should be blocked? I am aware of the standard library module index at https://docs.python.org/3/py-modindex.html However, that only covers the CPython 3.6 standard library. Other Python implementations have additional modules (e.g. IronPython has clr
for example). Occasionally, module names change between versions (e.g. xmlrpclib
vs xmlrpc
and copy_reg
vs copyreg
from 2.7 to 3.0).
In summary: The first step to dealing with this is to compile an authoritative list of package names.
It seems like the only place where the name of the uploaded package is checked is here. If that's true, the only blocked package names are requirements.txt
and rrequirements.txt
. Note that I'm very new to this codebase, this is definitely worth double checking.
On 15/09/2017 06:56, Jonas Neubert wrote:
What would be the correct/best way to compile the list of standard library modules that should be blocked? I am aware of the standard library module index at https://docs.python.org/3/py-modindex.html However, that only covers the CPython 3.6 standard library. Other Python implementations have additional modules (e.g. IronPython has |clr| for example). Occasionally, module names change between versions (e.g. |xmlrpclib| vs |xmlrpc| and |copy_reg| vs |copyreg| from 2.7 to 3.0 https://docs.python.org/3.6/whatsnew/3.0.html#library-changes).
In summary: The first step to dealing with this is to compile an authoritative list of package names.
It seems like the only place where the name of the uploaded package is checked is here https://github.com/pypa/warehouse/blob/e24f0d62a78c9bd8df725164aa121c4ecb4b34b4/warehouse/forklift/legacy.py#L616-L622. If that's true, the only blocked package names are |requirements.txt| and |rrequirements.txt|. Note that I'm very new to this codebase, this is definitely worth double checking.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pypa/warehouse/issues/2151#issuecomment-329689066, or mute the thread https://github.com/notifications/unsubscribe-auth/AFVaWlBidAnRPt_60lT11d5T2JVviX3Tks5sihGpgaJpZM4OHdY7.
The obvious question is what is the criteria for deciding to block a specific package? -- Steve (Gadget) Barnes Any opinions in this message are my personal opinions and do not reflect those of my employer.
This email has been checked for viruses by AVG. http://www.avg.com
I am quite curious about this issue and would be willing to help move it forward, but after another half hour of background reading, I am not certain whether there is community/maintainer support for this proposal.
A few observations and thoughts (please correct me if I'm wrong):
pip
should not be responsible for preventing user from installing (potentially) malicious code.ssl
is in the standard library starting in 2.6, but the PyPI ssl
package provides a useful shim for versions <= 2.5clr
is in the standard library for IronPython, but the PyPI clr
package could be a useful package (although it no longer is because the package name has been changed to styles
)Related PR: https://github.com/pypa/warehouse/pull/2396/
One problem to sort out here is what do we do when a new standard library module is added which already has a namespace collision with an existing project on PyPI what should happen? What about if someone wants to backport a new module to older versions of Python?
List of Python 3.6 standard library packages as text file: https://gist.github.com/jonemo/57c0eeff88ac5495592d4a4f9d60a96b Script I used to check for existence and author/maintainer of each on PyPI: https://gist.github.com/jonemo/a1c0f4768f2c0aa25e31388c0fd6e377 Output of said script shortly before the timestamp of this comment: https://docs.google.com/spreadsheets/d/15WoAkoaUW1BRSVt9yAOcObHgkWhfQOqUY0_xNbkTwL8/edit?usp=sharing
Stats:
My (relatively uninformed newbie/bystander) suggestion is to:
Possible next steps after this:
xmlrpclib
)clr
from IronPython)@jonemo Nice report but please note that I don't have a single package registered in my name on PyPI the above sounds like I have 13 the registration of those 13 names was performed by @stestagg another Steve I know who did specifically state in https://github.com/pypa/pypi-legacy/issues/585 that "As the owner of these packages, I don't mind them being taken off me, or access to them disabled as part of any fix." I did raise an enhancement proposal to build filtering into pip https://github.com/pypa/pip/issues/4527 but that was felt not to be worth pursuing at the pip end as it was not treating the root cause and would not address any other package installer hence this ticket.
https://pypi.org/project/stdlib-list is maintained and appears to be kept up to date. looks like it could be helpful, thanks to @jackmaney
with #2409 shipped here's what I see as remaining items to wrap this up:
Anything else?
I think that
Also block obvious cases of "type-squatting" (either manually or automatically via string-similarity metric) to avoid the problem described here
Is another issue as that will be more difficult problem to get right.
Thank you for using my library (stdlib-list
)! I update it after every minor version release (ie the next one will be 3.7). Please let me know if you find something that's missing in any of the lists.
Regarding this point
[Blocking obvious cases of typo-squatting] Is another issue as that will be more difficult problem to get right.
I understand this hesitation, but -- Perfect is the enemy of good, no? Seems like it could be gotten right enough for the top N most popular downloads. If there is a possibility of going down this path, I would be glad to enlist to help.
Now that new uploads of stdlib-shadowing names are no longer possible, can someone with the power to do so please remove the dummy packages that have been placed there by @stestagg? See @GadgetSteve's comment for context and https://github.com/pypa/pypi-legacy/issues/585 for a list of these dummy packages.
@GadgetSteve: Apologies for confusing you with @stestagg, who could have known that one Steve reports an issue previously blogged about by another Steve? 😬
@jonemo No problem on the confusion - it is not exactly new at work we have, in a different division another with the same first & surname and one in the same office with a surname that sounds similar. @hangtwenty Just to point out that there are 2 types of typo-squatting one is things like duplicate & transposed letters, (e.g.: urlllib or urlilb), and the other, increasingly popular is UTF-8 mimicry, e.g.: a package called аррӏе (actually u"\u0430\u0440\u0440\u04cf\u0435"), could spoof one called apple. One approach to the latter would be to require all packages to be named with 7 bit ASCII or similar but that has obvious limitations and may not be desirable.
@GadgetSteve We do indeed restrict PyPI name registrations to 7-bit ASCII: https://www.python.org/dev/peps/pep-0508/#names
While we don't spell out the reasoning there, the vast array of Unicode confusables is indeed the reason we have that restriction - with ASCII, it's mainly only l1
and O0
that you need to worry about.
As far as the actual typosquatting problem goes, my proposal in https://github.com/pypa/warehouse/issues/2268 is to distribute the review workload by notifying the maintainers of the projects with similar names, rather than always notifying the PyPI admins (since admin time and attention is a very limited resource). The PyPI admins would then only get direct notifications when registered project names are close to ones on the already prohibited list.
Hopefully such notifications will include a link for requesting registration blocking or audit.
-- Steve (Gadget) Barnes Any opinions in this message are my personal opinions and do not reflect those of my employer.
This email has been checked for viruses by AVG. http://www.avg.com
This might be obvious to people but for calculating the similarity we could use Levenshtein distance.
Relevant blog post by the way:
Please only remove my packages if the name blocking is applied to pypi as well as warehouse!
@stestagg blocking of names only occurs on upload of a new package name and all such uploads must now be via warehouse, so we’re good here!
ok, cool, I wasn't aware that had happened :)
Very happy with the outcome.
Apologies to @stestagg for not CCing on the original submission of this ticket.
Thanks to a helpful nudge from @brainwane... Audit of existing projects in conflict follows. I was able to quickly assess some modules based on authorship/ownership. Also quickly remove some of them which do not have any files or download links.
abc deleted
argparse valid
ast needs inspection
asyncio valid
buildtools needs inspection
calendar needs inspection
cd needs inspection
chunk needs inspection
code deleted
colorpicker needs inspection
commands deleted
compiler needs inspection
configparser valid
contextvars valid
csv valid
ctypes needs inspection
dataclasses valid
datetime valid
device needs inspection
dis needs inspection
distutils needs inspection
dl needs inspection
email needs inspection
enum valid
exceptions needs inspection
faulthandler valid
formatter needs inspection
framework needs inspection
functools needs inspection
gl needs inspection
hashlib needs inspection
hmac needs inspection
html needs inspection
html-parser needs inspection
htmlparser needs inspection
http needs inspection
http-client needs inspection
imp deleted
importlib valid
importlib-resources valid
io needs inspection
ipaddress needs inspection
jpeg needs inspection
logging needs inspection
logging-config needs inspection
mailbox needs inspection
modulefinder needs inspection
multiprocessing valid
nav needs inspection
new needs inspection
numbers needs inspection
parser needs inspection
pathlib valid
pipes deleted
pprint needs inspection
queue deleted
readline needs inspection
repr needs inspection
resource needs inspection
secrets needs inspection
select needs inspection
selectors needs inspection
sets needs inspection
shelve needs inspection
signal needs inspection
ssl valid
statistics needs inspection
test needs inspection
time needs inspection
token needs inspection
trace needs inspection
turtle needs inspection
typing valid
unittest needs inspection
uuid needs inspection
w needs inspection
wave needs inspection
wsgiref valid
xmlrpclib needs inspection
Thanks @ewdurbin - so the owners of those packages need to consider the name conflict? OK for me to open a new packaging-problems
issue with that list?
@brainwane next step would be for someone to take a closer look at all of those listed above as needs inspection
. some will be valid
and can/should remain on PyPI. I'm not sure that any action from project owners with existing conflicting names is needed.
@brainwane @ewdurbin what's the status of this effort? Do you need any help?
Heads-up @xmunoz in case you want to skim this issue for background
I can't say anything for the other packages @ewdurbin listed, but for the AST
package it seems pretty inactive and stale.
I don't know the procedure to get that package blocked, but maybe we can kindly ask from its maintainer to transfer their package to some other name first and after block?
@isidentical is the goal for the AST project to be replaced with something from Python Core or just to be blocked from re-upload?
Be blocked.
It has been pointed out online, on Hacker Noon, that the current PyPI allows people to register and upload packages with the same names as core python libraries which presents a potential attack vector as pip -U will "upgrade" the core library to the uploaded package, which may be given as a dependency of some other package.
Anybody, with the possible exception of the core python developers, trying to do this should definitely fail with an error message and possibly be flagged as suspicious activity.
I have tried to suggest blocking any upgrades to core packages at pip level, in 4527, but there is a consensus that this is really a problem at the PyPI/Warehouse end.