python / cpython

The Python programming language
https://www.python.org/
Other
60.89k stars 29.39k forks source link

Weird `inspect.getsource` behavior with generators #121331

Open MarkRotchell opened 4 days ago

MarkRotchell commented 4 days ago

Bug report

Bug description:

If you return two generators (one from a generator function and one a generator expression) from a function and use getsource on each, you get the code for the generator function for both.

def get_2_generators():
    def my_generator_1():
        for x in range(5):
            yield x*2 
    my_generator_2 = (x*3 for x in range(6))
    return my_generator_1(), my_generator_2

g1, g2 = get_2_generators()

getsource(g1.gi_code)
>>>'    def my_generator_1():\n        for x in range(5):\n            yield x*2 \n'

getsource(g2.gi_code)
>>>'    def my_generator_1():\n        for x in range(5):\n            yield x*2 \n'

Further, presumably not a bug but a current limitation, in general using getsource on generator-expression objects returns the code of the function they were created in, rather than the code of the generator expression itself, which can make things confusing when there are multiple such expressions, e.g.:

def get_2_generators():
    my_generator_1 = (x*2 for x in range(5)) 
    my_generator_2 = (x*3 for x in range(6))
    return my_generator_1, my_generator_2

g1, g2 = get_2_generators()
getsource(g1.gi_code)
>>>'def get_2_generators():\n    my_generator_1 = (x*2 for x in range(5)) \n    my_generator_2 = (x*3 for x in range(6))\n    return my_generator_1, my_generator_2\n'

getsource(g2.gi_code)
>>>'def get_2_generators():\n    my_generator_1 = (x*2 for x in range(5)) \n    my_generator_2 = (x*3 for x in range(6))\n    return my_generator_1, my_generator_2\n'

Can we fix both of the above by making getsource return the code of the generator expression itself?

CPython versions tested on:

3.12

Operating systems tested on:

Windows

picnixz commented 3 days ago

I think the first example is a bug (I mean, you don't get anything related to your second generator). I'd be interested in trying to fix it though if neither Alex nor Jelle wants to do it. But I think fixing the first one would presumably fix the second one as well. I'll try to play a bit now to see what happens.

EDIT: I cannot reproduce it in 3.14.

MarkRotchell commented 2 days ago

Looks like quite a few changes to inspect in 3.13+ due to some other bug fixes, so think this has been somewhat fixed in that version, if only by accident. I don't have a 3.14 version installed, but I've downloaded 3.13.0b an indeed the top issue is resolved:

def get_2_generators():
    def my_generator_1():
        for x in range(5):
            yield x*2 
    my_generator_2 = (x*3 for x in range(6))
    return my_generator_1(), my_generator_2

g1, g2 = get_2_generators()

print(getsource(g2.gi_code))
>>>    my_generator_2 = (x*3 for x in range(6))

however, there are stil some issues, e.g., only returning the first line when the generator is split over multiple lines

g1 = (
    x*2 for x in range(3)
)

print(getsource(g1.gi_code))

>>>g1 = (

Also, perhaps not a bug, but I still think it would be nice to get just (x*3 for x in range(6)), especially because of the ambiguity in e.g.:

g1, g2 = (x*2 for x in range(5)), (x*4 for x in range(6))

print(getsource(g1.gi_code))
>>>g1, g2 = (x*2 for x in range(5)), (x*4 for x in range(6))
print(getsource(g2.gi_code))
>>>g1, g2 = (x*2 for x in range(5)), (x*4 for x in range(6))

or

g1 = ((x*2 for x in range(10)) for _ in range(20))
g2 = next(g1)

print(getsource(g1.gi_code))
>>>g1 = ((x*2 for x in range(10)) for _ in range(20))
print(getsource(g2.gi_code))
>>>g1 = ((x*2 for x in range(10)) for _ in range(20))

though I appreciate these cases may be a non-trivial parsing job, especially as they're not currently solved for lambdas e.g.:

a, b = (lambda x: x*2), (lambda x: x+2)

print(getsource(a))
>>>a, b = (lambda x: x*2), (lambda x: x+2)

I'm not fully familiar with the way these things are ordered - would there be a possibility of a bug fix in an impending 3.12.5 by bringing over some code from 3.13+, or is that not the way it works?

MarkRotchell commented 2 days ago

If the problem is too difficult to solve before 3.13+, perhaps we should raise NotImplementedError for 3.12.5, as I think it's clear at the moment inspect.getsource hasn't been implemented with generator expressions in mind.

Should be somewhat simple by adding the following to inspect

def isgeneratorexpressioncode(object):
    return iscode(object) and object.co_name == '<genexpr>'

and then the following to inspect.getsourcelines

if isgeneratorexpressioncode(object):
    raise NotImplementedError('Code objects for generator expressions not currently supported')

Also, as an aside, per PEP 8, shouldn't the is{something} functions in inspect have underscores so is_code() not iscode()?