Compile time textwrap.dedent() equivalent for str or bytes literals

gpshead commented 5 years ago

BPO	36906
Nosy	@rhettinger, @terryjreedy, @gpshead, @stevendaprano, @methane, @serhiy-storchaka, @MojoVampire, @Carreau, @pablogsal, @remilapeyre, @Marco-Sulla, @iforapsy, @jtojnar
PRs	python/cpython#13445

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = 'https://github.com/gpshead' closed_at = None created_at = labels = ['interpreter-core', 'type-feature', '3.10'] title = 'Compile time textwrap.dedent() equivalent for str or bytes literals' updated_at = user = 'https://github.com/gpshead' ``` bugs.python.org fields: ```python activity = actor = 'iforapsy' assignee = 'gregory.p.smith' closed = False closed_date = None closer = None components = ['Interpreter Core'] creation = creator = 'gregory.p.smith' dependencies = [] files = [] hgrepos = [] issue_num = 36906 keywords = ['patch'] message_count = 46.0 messages = ['342373', '342407', '342420', '342429', '342477', '342488', '342569', '342600', '342909', '342914', '342915', '342916', '342917', '342918', '342927', '342928', '342931', '342938', '342962', '342965', '342968', '342972', '343961', '343991', '350162', '356153', '356154', '356155', '356160', '356162', '356182', '356193', '356198', '356203', '356204', '365688', '381540', '381544', '381545', '381546', '381584', '381599', '381602', '381605', '382456', '398270'] nosy_count = 14.0 nosy_names = ['rhettinger', 'terry.reedy', 'gregory.p.smith', 'steven.daprano', 'methane', 'serhiy.storchaka', 'josh.r', 'mbussonn', 'pablogsal', 'remi.lapeyre', 'Marco Sulla', 'iforapsy', 'Miguel Amaral', 'jtojnar'] pr_nums = ['13445'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue36906' versions = ['Python 3.10'] ```

gpshead commented 1 year ago

That question is valid but seems a mere implementation detail. A PR implementing the optimization would help motivate making that decision.

I'd still love it to be automatic for docstrings as it'd save memory for everyone in the world, but that can be considered a separate follow-up feature.

stevendaprano commented 1 year ago

cleandoc and dedent are functionally different, so it's not just an implementation detail, its a difference of semantics.

Looking at the output of the two, I think cleandoc is the more correct behaviour for docstrings:

s = """First
    Second
    Third
        Indented fourth
    Fifth
    """

inspect.cleandoc(s) # --> 'First\nSecond\nThird\n    Indented fourth\nFifth.'
textwrap.dedent(s)  # --> 'First\n    Second\n    Third\n        Indented fourth\n    Fifth.\n'

I'd still love it to be automatic for docstrings as it'd save memory for everyone in the world,

What are the consequences of just automatically dedenting docstrings? Technically its a breaking change, but is there code or people who rely on the current leading whitespace?

help() already reformats the docstring, as does inspect, so maybe we should seriously consider just changing the rules for docstrings to automatically dedent them at function build time. It would probably have to go through a future import first.

methane commented 1 year ago

Looking at the output of the two, I think cleandoc is the more correct behaviour for docstrings:

This issue is premierly about general multine string dedenting. So please forget about docstring now and consider what is the best dedent behevior for general multiline strings like SQL.

s = """First
    Second
    Third
        Indented fourth
    Fifth
    """

inspect.cleandoc(s) # --> 'First\nSecond\nThird\n    Indented fourth\nFifth.'
textwrap.dedent(s)  # --> 'First\n    Second\n    Third\n        Indented fourth\n    Fifth.\n'

I don't think special casing first line is not needed for general dedenting.

stevendaprano commented 1 year ago

On Tue, Jan 10, 2023 at 10:02:47PM -0800, Inada Naoki wrote:

Looking at the output of the two, I think cleandoc is the more correct behaviour for docstrings:

This issue is premierly about general multine string dedenting.

But we already have a solution for that: the textwrap module.

The textwrap module is inconvenient to use with docstrings, which is also when we want to put extra indents in the text block to make it look good in the .py file, but we need to remove that leading space before displaying the text.

I agree that we should also think about other uses of dedent, but we should not forget docstrings. They are an important use-case, and as far as I am concerned, they are the driving motivation for moving dedent into the str type as a method.

methane commented 1 year ago

I agree that we should also think about other uses of dedent, but we should not forget docstrings. They are an important use-case, and as far as I am concerned, they are the driving motivation for moving dedent into the str type as a method.

As I wrote in this comment, new syntax can be different from inspect.cleandoc().

I prefer having the best multiline syntax by learning from Julia, Java, Swift, etc...

Please read this comment too. Compile time docstring cleandoc is tracked in #81283.

terryjreedy commented 1 year ago

@Carreau "Is this a supposed to deprecating textwrap.dedent?" @stevendaprano "I don't think so, but eventually it might." I definitely feel that this should NOT be part of the initial PR without being discussed here.

One of the removed uses in the PR is a 16 line, about 500 char, """-quoted string literal in idlelib. At the momemt I don't really like having 'dedent' moved from the starting line to the last. (I could use inspect.cleandoc instead. Or literal concatenation as elsewhere in the file. )

python / cpython

Compile time textwrap.dedent() equivalent for str or bytes literals #81087