python / cpython

The Python programming language
https://www.python.org
Other
63.24k stars 30.28k forks source link

Format parser is too permissive #54230

Open abalkin opened 14 years ago

abalkin commented 14 years ago
BPO 10021
Nosy @terryjreedy, @mdickinson, @abalkin, @ericvsmith, @benjaminp, @merwok

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-bug', 'docs'] title = 'Format parser is too permissive' updated_at = user = 'https://github.com/abalkin' ``` bugs.python.org fields: ```python activity = actor = 'belopolsky' assignee = 'docs@python' closed = False closed_date = None closer = None components = ['Documentation'] creation = creator = 'belopolsky' dependencies = [] files = [] hgrepos = [] issue_num = 10021 keywords = [] message_count = 11.0 messages = ['117961', '117964', '117965', '117966', '117967', '117969', '117971', '117992', '118009', '118011', '118232'] nosy_count = 7.0 nosy_names = ['terry.reedy', 'mark.dickinson', 'belopolsky', 'eric.smith', 'benjamin.peterson', 'eric.araujo', 'docs@python'] pr_nums = [] priority = 'normal' resolution = None stage = 'needs patch' status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue10021' versions = ['Python 3.6'] ```

abalkin commented 14 years ago

According to the Format String Syntax section [1], attribute_name must be an identifier. However, the parser does not catch a violation of this rule and happily passes non-indentifier strings to getattribute:

>>> class X:
...    def __getattribute__(self, a): return 'foo'
... 
>>> '{.$#@}'.format(X())
'foo'

If this is a desirable feature, I think it should be clearly documented because in some cases, for example when formatted objects are proxies to  database entries, passing arbitrary strings to __getattribute__ may be wasteful at best and a security hole at worst.

[1] http://docs.python.org/dev/py3k/library/string.html#format-string-syntax

abalkin commented 14 years ago

PEP-3101 has the following

""" Implementation note: The implementation of this proposal is not required to enforce the rule about a simple or dotted name being a valid Python identifier. Instead, it will rely on the getattr function of the underlying object to throw an exception if the identifier is not legal. The str.format() function will have a minimalist parser which only attempts to figure out when it is "done" with an identifier (by finding a '.' or a ']', or '}', etc.). """

Apparently CPython takes advantage of this note in its implementation. Thus this is not a bug, but I think this implementation note should be added to CPython documentation.

ericvsmith commented 14 years ago

Right. It seemed like a hassle to have the str.format parser try to figure out what a valid identifier is, so it just passes it through.

I don't see this as any different from:

>>> class X:
...    def __getattribute__(self, a): return 'foo'
... 
>>> getattr(X(), '$#@')
'foo'
benjaminp commented 14 years ago

2010/10/4 Eric Smith \report@bugs.python.org\:

Eric Smith \eric@trueblade.com\ added the comment:

Right. It seemed like a hassle to have the str.format parser try to figure out what a valid identifier is, so it just passes it through.

You can always use "str.isidentifier()" (I don't remember if there's a capi).

ericvsmith commented 14 years ago

Ah, but I don't need to in order to comply with the PEP!

abalkin commented 14 years ago

On Mon, Oct 4, 2010 at 1:02 PM, Eric Smith \report@bugs.python.org\ wrote: ..

Ah, but I don't need to in order to comply with the PEP!

This is true and this is the reason I changed this issue from bug to doc. I seem to remember this having been discussed before, but I cannot find the right thread. There are at least two reasons cpython docs should mention this:

  1. From current documentation, users are likely to expect a value error from format(".$#@", ..) rather than an attribute error.
  2. Naive proxy objects may implement __getattribute__ that blindly inserts attribute name into database queries leading to all kinds of undesired behaviors.
ericvsmith commented 14 years ago

I agree it should be documented as a CPython specific behavior. I should also add a CPython specific test for it, modeled on your code (if one doesn't already exist). I'll look into it.

mdickinson commented 14 years ago

I seem to remember this having been discussed before, but I cannot find the right thread.

It came up in the bpo-7951 discussion, I think.

benjaminp commented 14 years ago

This should not be classified as an "implementation detail". Either we should document it and cause other implementations to support it or check it ourselves.

ericvsmith commented 14 years ago

I agree that it being an implementation detail is not a good thing. I think we should just document the current CPython behavior as the language standard: once parsed, any string after a dot is passed to getattr. I can't see why we should pay the penalty of validating it as an identifier when the behavior is well defined and matches my getattr example in msg 117965.

terryjreedy commented 14 years ago

This is a bug report in that there is a discrepancy between the grammar in the doc and the behavior. Laxiness can lead to portability problems if CPython is lax compared to a normal reading of the spec and another implementation takes the spec seriously.

I agree that implementation details that lead to an exception here and not there, or vice versa, are best avoided.

For getattr: ''' getattr(object, name[, default]) Return the value of the named attributed of object. name must be a string. ''' the doc is careful to just say that name must be a string, not specifically an identifier. Given that, I suppose "attribute_name ::= identifier" should be changed to match so that string formats can always (all implementations) also access non-identifier attributes.