psf / black

The uncompromising Python code formatter
https://black.readthedocs.io/en/stable/
MIT License
38.82k stars 2.45k forks source link

Fluent interfaces #67

Closed sfermigier closed 6 years ago

sfermigier commented 6 years ago

Operating system: Mac OS Python version: 3.6.4 Black version: 18.3a3 Does also happen on master: yes


Black removes trailing backlashes, which can be used, for instance with yapf, to signal "don't join these line".

This is problematic, for instance, if one uses the "fluent interface" idiom, which can lead to long chains of attribute accesses in a given expression.

Here's an example:

-        return sa.sql \
-            .select([sa.sql.func.count(membership.c.user_id)]) \
-            .where(membership.c.group_id == cls.id) \
-            .group_by(membership.c.group_id) \
-            .label('members_count')
+        return sa.sql.select([sa.sql.func.count(membership.c.user_id)]).where(
+            membership.c.group_id == cls.id
+        ).group_by(
+            membership.c.group_id
+        ).label(
+            'members_count'
+        )

The easiest solution would be to keep the line break in the presence of a backslash.

sethmlarson commented 6 years ago

What is the result if you do this?

return (
    sa.sql
    .select([sa.sql.func.count(membership.c.user_id)])
    .where(membership.c.group_id == cls.id)
    .group_by(membership.c.group_id)
    .label('members_count')
)
ambv commented 6 years ago

Black doesn't support fluent interfaces or any other hand-formatted idioms. Backslashes are always removed, I responded as to why in https://github.com/ambv/black/issues/64.

This will be solved by # fmt: off which I am working on now and will be part of the next release on Monday.

ambv commented 6 years ago

And for the record, I personally find the Black-formatted version easier on the eyes. Leading dots are invalid syntax in Python without backslash magic.

carljm commented 6 years ago

No plans to change Black's formatting approach here; we already have #5 for # fmt: off escape hatch.

ambv commented 6 years ago

I will reconsider this. My current idea is as follows:

By fluent interface I understand treating "." as a delimiter that comes first on the line. Is there anything else?

ambv commented 6 years ago

@dstufft, @wbolster, note this is re-opened now.

dstufft commented 6 years ago

@ambv I had never heeard of it called fluent interface before, so I'm not sure :) I do stuff like this a lot in Warehouse though:

        release = (
            request.db.query(Release)
                      .filter(Release.project == project)
                      .order_by(
                          Release.is_prerelease.nullslast(),
                          Release._pypi_ordering.desc())
                      .limit(1)
                      .one()
        )
    project_names = [
        r[0] for r in (
            request.db.query(Project.name)
                   .order_by(Project.zscore.desc().nullslast(),
                             func.random())
                   .limit(5)
                   .all())
    ]
    release_a = aliased(
        Release,
        request.db.query(Release)
                  .distinct(Release.name)
                  .filter(Release.name.in_(project_names))
                  .order_by(Release.name,
                            Release.is_prerelease.nullslast(),
                            Release._pypi_ordering.desc())
                  .subquery(),
    )
    trending_projects = (
        request.db.query(release_a)
               .options(joinedload(release_a.project))
               .order_by(func.array_idx(project_names, release_a.name))
               .all()
    )

    latest_releases = (
        request.db.query(Release)
                  .options(joinedload(Release.project))
                  .order_by(Release.created.desc())
                  .limit(5)
                  .all()
    )

    counts = dict(
        request.db.query(RowCount.table_name, RowCount.count)
                  .filter(
                      RowCount.table_name.in_([
                          Project.__tablename__,
                          Release.__tablename__,
                          File.__tablename__,
                          User.__tablename__,
                      ]))
                  .all()
    )

I pretty much only do it with SQLAlchemy, but that's also the only thing I really use whose API is a long chain of methods like that. The reason I do it that way (and afaik prior to learning others did it, I didn't know it had a name!) was because it matches the way I would format the SQL the most and groups the important aspects together on one line (or a few lines), so like taking a look at:

        release = (
            request.db.query(Release)
                      # The "Filter by Release.project" is one logical "thought"
                      .filter(Release.project == project)
                      # The Order by is another logical though, but is too long
                      # for one line, so it has to get split.
                      .order_by(
                          Release.is_prerelease.nullslast(),
                          Release._pypi_ordering.desc())
                      # again, a single logical though, which isn't really modified at all by
                      # the previous lines, they could be deleted/added and this would
                      # stay the same.
                      .limit(1)
                     # and again, one logical thought.
                      .one()
        )

The black way of formatting that, which is:

if True:
    if True:
        release = (
            request.db.query(Release).filter(Release.project == project).order_by(
                Release.is_prerelease.nullslast(), Release._pypi_ordering.desc()
            ).limit(
                1
            ).one()
        )

The individual lines end up doing more than one logical thing, and you end up having to be much more careful about having to read through it to actually understand what's going on, because it each line ends up doing multiple things (the . are almost acting like ; here, conceptually).

Anyways, I'm glad you're reconsidering this. The conditions as described sounds like it would work fine for me.

dstufft commented 6 years ago

Oh just for completness sake, heres the Python code and the SQL that it's mimic'ing:

request.db.query(Release)
          .filter(Release.project == project)
          .order_by(
              Release.is_prerelease.nullslast(),
              Release._pypi_ordering.desc())
          .limit(1)
          .one()
SELECT * FROM releases
WHERE releases.project = 'foo'
ORDER BY
    releases.is_prerelease NULLS LAST,
    releases._pypi_ordering DESC
LIMIT 1

That SQL isn't completely valid, but you can get the idea?

sfermigier commented 6 years ago

@ambv Thanks.

@ambv & @dstufft : Fluent interfaces have been originally discussed by Martin Fowler in 2005 in: https://martinfowler.com/bliki/FluentInterface.html and Eric Evans in the "Domain Driven Design" (blue) book.

There are some discussions in https://en.wikipedia.org/wiki/Fluent_interface , more specifically: https://en.wikipedia.org/wiki/Fluent_interface#Python though not very deep.

In the Python context, I have seen them used mostly for creating DSLs (as stated in the top of the Wikipedia article), more specifically for ORM and ORM-like libraries and frameworks (SQLAlchemy, others).

My and @dstufft 's examples give a pretty good idea of what is considered "canonical" code when building queries with the SQLAlchemy DSL.

Regarding the Python standard library, the only cas I can think of strings substitutions. Here's an example where Black reformats Flask's code in a less than savoury way:

-    rv = dumps(obj, **kwargs) \
-        .replace(u'<', u'\\u003c') \
-        .replace(u'>', u'\\u003e') \
-        .replace(u'&', u'\\u0026') \
-        .replace(u"'", u'\\u0027')
+    rv = dumps(obj, **kwargs).replace(u'<', u'\\u003c').replace(
+        u'>', u'\\u003e'
+    ).replace(
+        u'&', u'\\u0026'
+    ).replace(
+        u"'", u'\\u0027'
+    )

Regarding how to deal with it: IMHO the best way would be to consider that code with trailing backslashes has already been crafted manually by the programmer, and don't change it. This is, more or less, the current behaviour of Yapf or PyCharm's fomatter.

@ambv It you still don't agree with this idea, what you propose seems to make sense.

I note, however, that programmers will have to explicitly add parenthesis about long fluent expressions (as well as non-fluent one, like complex boolean expressions with no function calls) in order for Black to split the lines and not create a pep8 violation.

ambv commented 6 years ago

Backslashes are bad form. Organizational parentheses are much preferred.

dstufft commented 6 years ago

I generally don't use backslashes for these, because multiple lines of backslashes always seemed off to me, parentheses seemed to always make the most sense to me.

wbolster commented 6 years ago

some thoughts.

the pattern is a.b.c.d(...).e(..., ...).

breaking it into pieces that should (preferably) end up on "logical lines":

each of these could be one line, or more than one if the line becomes too long. when wrapping, i think normal black rules could apply, with the exception that the closing paren should not go on a line of its own.

re. indenting of the whole statement, multiple-of-four indents should be required, just like elsewhere. that can.be easily achieved like this:

query = (
    a.b.c
    .d()
    .e(
        ...,
        ...))

the final closing paren is debatable, and could go on a new line.

ambv commented 6 years ago

Final closing parens are always dedented and on a separate line in Black.

dstufft commented 6 years ago

For the final closing paren, I've switched back and forth between new line and inline with the final line, I think either works fine.

I think the only tweak I'd make here is I'd align the . withe final . in the first line, so you end up with:

query = (
    a.b.c
       .d()
       .e(
           ...,
           ...
       )
)

This actually ends up having the dot leading lines at not a multiple of 4, but in my experience, these things tend to be "normal" python attribute access, until you get to the chain of methods portion, and I find it easier to read down the order of operations if they're all aligned. I think this is more obvious with the example I posted above:

release = (
    request.db.query(Release)
              .filter(Release.project == project)
              .order_by(
                Release.is_prerelease.nullslast(),
                Release._pypi_ordering.desc())
              .limit(1)
              .one()
)

Here request.db isn't really part of the fluent interface portion, it's just the leading portion you access to get to where the object implementing the fluent interface lives, but once you get to the fluent interface, you want it aligned so that you can generally read it straight down. I think that reads better than:

release = (
    request.db.query(Release)
    .filter(Release.project == project)
    .order_by(
        Release.is_prerelease.nullslast(),
        Release._pypi_ordering.desc())
    .limit(1)
    .one()
)

because it still provides some visual distinction that indicates there is a reason why these lines are starting with . and keeps the flow so you sort of scan it straight down in one logical block.

That being said, something like:

release = (
    request.db
    .query(Release)
    .filter(Release.project == project)
    .order_by(
        Release.is_prerelease.nullslast(),
        Release._pypi_ordering.desc())
    .limit(1)
    .one()
)

Isn't the worst thing ever, you just basically have to mentally ignore the first line completely when parsing the fluent interface portion. I think it's worse than aligning with the . as in my first "real" code sample in this post.

Although I guess we'd have the middle set of parens on it's own line too, so it would end up looking like:

release = (
    request.db.query(Release)
              .filter(Release.project == project)
              .order_by(
                Release.is_prerelease.nullslast(),
                Release._pypi_ordering.desc()
              )
              .limit(1)
              .one()
)

If you do the "align with the . and not try to make perfect 4 space indent", you have to decide how you indent sub clauses (the arguments to order_by( ... ) in my example). My editor when I hit tab just aligns them with the next 4 space indent, so you end up with (in my example) 14 spaces prior to the .order_by(, then 18 spaces prior to the Release.is_prerelease.nullslast(),, which "corrects" the indentation so that additional nested function calls end up with the "correct" indentation.

The downside to this, is you end with two "partial" indents until you're back on always multiples of 4. I think this is OK because I think it's far more readable.

Another option is:

release = (
    request.db.query(Release)
              .filter(Release.project == project)
              .order_by(
                    Release.is_prerelease.nullslast(),
                    Release._pypi_ordering.desc()
              )
              .limit(1)
              .one()
)

Which puts 20 spaces before the Release.is_prerelease.nullslast(),, which means that there is a little extra space between it and the .order_by, but it still reads perfectly fine, and then the only lines that don't have an exact multiple of 4 for indents comes the lines that start with . (assuming you do the align like I suggested).

wbolster commented 6 years ago

indents >4 are avoided by black, and personally i mostly agree with that (wastes space and ugly), but in any case i don't see why fluent apis should be an exception.

ambv commented 6 years ago

Black never aligns with anything while indenting. It will indent four characters within any block or bracket pair and that's it. This always gives us maximum space available for the line content. Also, I feel that being consistent is more important than hand-optimizing particular edge cases.

The simplest implementation will be to simply treat the dot as a delimiter (like "and", "or", etc.). I don't particularly like hand-optimizing leading attribute accesses because attribute accesses might just as well happen later, eg.:

result = (
    request
    .db
    .query(Release)
    .filter(Release.project == project)
    .some_attribute  # note
    .limit(1)
    .one()
    .release_name  # note
)

A reverse situation might also be possible where the first line is a call, too:

result = (
    Request(with='', some='', data='')
    .ditto
    .ditto()
    .one()
)

I'm inclined to think that if somebody really hates wasting lines on the leading attribute accesses, assigning those to a name is the way to go.

Oh, one more thing. If the bracket contents fit in one line, they will (like everywhere else in Black):

result = (
    request.db.query(Release).limit(1).one().release_name
)

And to be clear, if the entire thing fits in one line, it will.

result = request.db.query(Release).limit(1).one().release_name

Summing up, I want Black's implementation of fluent interfaces to be as trivial and consistent as the rest of the formatting it applies.

dstufft commented 6 years ago

I was playing around with it, and I think something like:

release = (
    request.db
        .query(Release)
        .filter(Release.project == project)
        .order_by(
            Release.is_prerelease.nullslast(),
            Release._pypi_ordering.desc(),
        )
        .limit(1)
        .one()
)

You still end up with an extra level of indentation, but I think that ends up more readable, 🤷‍♂️

That being said, I find all of these examples more readable than what black does now so any of them are an improvement. I'm also perfectly fine with fitting things onto one line when possible (which is what I generally do already).

wbolster commented 6 years ago

@ambv the only "special" (quite common in practice) case would be leading attribute accesses. preferring a single line for this seems worth it. when mixing calls and attr access, one per line indeed, as you suggest.

kalekseev commented 6 years ago

I would also vote for one level indented version

release = (
    request
        .db
        .query(Release)
        .filter(Release.project == project)
        .order_by(
            Release.is_prerelease.nullslast(),
            Release._pypi_ordering.desc(),
        )
        .limit(1)
        .one()
)

rather than

release = (
    request
    .db
    .query(Release)
    .filter(Release.project == project)
    .order_by(
        Release.is_prerelease.nullslast(),
        Release._pypi_ordering.desc(),
    )
    .limit(1)
    .one()
)
wbolster commented 6 years ago

anything but a 4-space indent would be inconsistent with how black works elsewhere, and would be in direct conflict with black's consistency goals. @ambv was very clear about that in an earlier comment:

Black never aligns with anything while indenting. It will indent four characters within any block or bracket pair and that's it. This always gives us maximum space available for the line content. Also, I feel that being consistent is more important than hand-optimizing particular edge cases.

that said, i do advocate for a sole exception which is not related to indentation, but to line breaking. prefer single lines for attribute accesses at the start of the chain (see my my earlier comment).

black's (upcoming) support for ‘fluent apis’ seems only about line breaks (not about indentation), and i think it's reasonable to expect black to acknowledge that both ‘continuation lines are special’ and ‘leading attribute accesses are special’ since that is the predominant (only?) style for fluent apis, e.g. sqlalchemy + web frameworks.

the example would then become:

release = (
    request.db
    .query(Release)
    .filter(Release.project == project)
    .order_by(
        Release.is_prerelease.nullslast(),
        Release._pypi_ordering.desc(),
    )
    .limit(1)
    .one()
)
kalekseev commented 6 years ago

@wbolster consistency is good but I believe that we should consider user experience too, otherwise we can consistently produce code that majority of people don't like and never get into wide adoption in the community.

wbolster commented 6 years ago

black always uses 4 spaces, and that's not considered a ‘user experience’ problem for normal code, so what exactly is the ‘user experience’ problem with 4 space indents for fluent apis, as shown here?

kalekseev commented 6 years ago

@wbolster for example I like to see entry level separated from the body because it helps quickly scan the beginning of chain and skip whole body if I'm not interested in it. Example

Release.is_prerelease.nullslast()
Release._pypi_ordering.desc()
(
    request
    .query(Release)
    .filter(Release.project == project)
    .order_by(
        Release.is_prerelease.nullslast(),
        Release._pypi_ordering.desc(),
    )
    .limit(1)
    .one()
)
Release.is_prerelease.nullslast()
Release._pypi_ordering.desc()

vs

Release.is_prerelease.nullslast()
Release._pypi_ordering.desc()
(
    request
        .query(Release)
        .filter(Release.project == project)
        .order_by(
            Release.is_prerelease.nullslast(),
            Release._pypi_ordering.desc(),
        )
        .limit(1)
        .one()
)
Release.is_prerelease.nullslast()
Release._pypi_ordering.desc()

Although I agree that in python we have parents and it helps to parse the chain as one sentence for me second version looks more readable but maybe I'm polluted by javascript code style :)

wbolster commented 6 years ago

tbh, i don't think that is very typical example. a ‘fluent api call by itself’ is not the usual way of using fluent apis. typically the result will be assigned to a name, or returned directly:

rows = (
    db.query(...)
    .filter()
    .all()
)
kalekseev commented 6 years ago

Can't say about how typical is that but examples are orm bulk_create/update/delete queries their output is rarely needed.

carljm commented 6 years ago

I've been talking with @ambv about this, and the problem is that the proposed feature here:

if the programmer explicitly wrapped a chain of at least two calls in parentheses (for return or assignment), then Black will leave those parentheses and do fluent interfaces; regular formatting happens in all other situations.

is incompatible with #4, which is an important feature to get reasonable formatting in lots of other situations (e.g. see #98 for a non-import case). If Black considers organizational parentheses to be "formatting" which Black will freely add or remove as needed (a reasonable choice, since organizational parens don't appear in the AST), we can't use their presence or absence as a signal for how to format chained method calls. Stable and deterministic formatting without reference to existing formatting is a core tenet of Black. TBH the very idea of the programmer "opting in" to break-before-period formatting by adding wrapping parens is a bit contrary to Black's ethos to begin with: Black is the uncompromising and non-configurable formatter.

That means that we have two options:

1) Always prefer breaking before period (and wrapping in organizational parens) for any chained method calls. 2) Never break before period; stick with current formatting for chained method calls.

The argument against (1) is that it is inconsistent with how Black formats everything else. In every other case, Black will break after open parens. This also means that it will result in spurious reformatting noise when the first chained call is added. E.g. if I have

important_rows = Model.objects.filter(
    foo='some long string',
    bar='another',
)

and I add a chained call to it, suddenly I will get

important_rows = (
    Model
    .objects
    .filter(foo='some long string', bar='another')
    .select_related('baz')
)

Where the entire existing and unchanged portion of the expression has now been reformatted and generates a bunch of meaningless noise in the diff. Whereas the current formatting,

important_rows = Model.objects.filter(
    foo='some long string',
    bar='another',
).select_related('baz')

by virtue of remaining consistent with how Black formats non-chained method calls, minimizes the size of the diff and highlights only the thing that is actually new.

On the other hand, I will acknowledge that the advantage of (1) is that for long method chains, it does a better job of putting connected things on the same line.

TBH I'm not totally clear where it leaves us, but I do find the simplicity and consistency of option (2) (stick with existing behavior) pretty attractive.

Anyone want to propose a way out of this dilemma that we haven't considered yet?

dstufft commented 6 years ago

by virtue of remaining consistent with how Black formats non-chained method calls, minimizes the size of the diff and highlights only the thing that is actually new.

So I don't think you're ever going to get something that doesn't make some case worse in some situations. I think that optimizing for the smallest possible diffs above all else is counter-productive (and I don't think that black actually does that anyways!). Keeping it in mind is a good idea, but you typically only read a diff once, during code review (and maybe during blames and such, but they're not something you're going to do for every diff etc) while you and other people are going to read the final result over and over again.

So it feels like to me, the first priority should be in maximizing the readability of the code not in minimizing the size of the diff.

On the other hand, I will acknowledge that the advantage of (1) is that for long method chains, it does a better job of putting connected things on the same line.

One option here might be to focus on the long here, and make it for chains above X length, where X is greater than one? I don't know.

All that being said, I find:

important_rows = (
    Model
    .objects
    .filter(foo='some long string', bar='another')
    .select_related('baz')
)

Way easier to read what's going on than I do

important_rows = Model.objects.filter(
    foo='some long string',
    bar='another',
).select_related('baz')

I know that personally the second pattern there has caused me to miss the cause of bugs more than once, and I generally try to avoid it to where if if I'm not doing fluent style formatting for that bit, my natural inclination is to do:

important_rows = Model.objects.filter(
    foo='some long string',
    bar='another',
)
important_rows = important_rows.select_related("baz")

just to try to break up the method chaining so that it's easier to read.

Here is a real snippet of code (I pasted it above in the fluent style) formatted in the current black style:

    project_names = [
        r[0]
        for r in (
            request.db.query(Project.name).order_by(
                Project.zscore.desc().nullslast(), func.random()
            ).limit(
                5
            ).all()
        )
    ]
    release_a = aliased(
        Release,
        request.db.query(Release).distinct(Release.name).filter(
            Release.name.in_(project_names)
        ).order_by(
            Release.name,
            Release.is_prerelease.nullslast(),
            Release._pypi_ordering.desc(),
        ).subquery(),
    )
    trending_projects = (
        request.db.query(release_a).options(joinedload(release_a.project)).order_by(
            func.array_idx(project_names, release_a.name)
        ).all()
    )

    latest_releases = (
        request.db.query(Release).options(joinedload(Release.project)).order_by(
            Release.created.desc()
        ).limit(
            5
        ).all()
    )

    counts = dict(
        request.db.query(RowCount.table_name, RowCount.count).filter(
            RowCount.table_name.in_(
                [
                    Project.__tablename__,
                    Release.__tablename__,
                    File.__tablename__,
                    User.__tablename__,
                ]
            )
        ).all()
    )

I find the entire code black pretty much universally worse to read except for bits that didn't really change between the two, and as someone with a code base that utilizes a lot of long method chains (SQLAlchemy) the more I think about the current output of black, the less I find myself being willing to have my code formatted that way because I find the result significantly harder to read (and since there are a lot of them, it's not weird one offs that make something like a comment to disable formatting an OK option, since I'd have the comment littered all over the code).

That is perfectly OK! Black doesn't have to be for everyone (and likely won't, since it's zero config and opinionated!), but I offer that anecdote up as a data point for y'all to hopefully help make this decision either way.

dstufft commented 6 years ago

Also only semi related, but this bit is particularly hard to read for me:

project_names = [
    r[0]
    for r in (
        request.db.query(Project.name).order_by(
            # Literally the only difference between Project.zscore.desc().nullslast().func.random()
            # and Project.zscore.desc().nullslast(), func.random() is the difference between
            # a period and a comma and a single space. Incredibly easy to miss the
            # difference between these two and it's part of an area where there are a
            # whole bunch of method chaining happening already... maybe it should
            # always force a new-line here? I don't know.
            Project.zscore.desc().nullslast(), func.random()
        ).limit(
            5
        ).all()
    )
]
kalekseev commented 6 years ago

I think that optimizing for the smallest possible diffs above all else is counter-productive (and I don't think that black actually does that anyways!)

strongly agree with that, I fear that mindset "smallest diffs first" may hurt the success of the black, I would place "wide adoption in the community while stay opinionated" on the first place.

jarshwah commented 6 years ago

I was thinking along the same lines as Donald, that if an expression has > N method calls, then format in a fluent style if it can't fit on a single line.

The main issue I have with the status quo is divorcing function arguments from the function call. Typically when a function call begins, it's close to the beginning of the line, and the arguments are directly below it. The closing paren is de-dented and it's all visually coherent. With fluent interfaces, the function call can be a lot further to the right, and the closing paren then continues into the next call. It makes it a lot harder to tie the function and its arguments together.

graingert commented 6 years ago

I don't particularly like hand-optimizing leading attribute accesses because attribute accesses might just as well happen later

+1

wbolster commented 6 years ago

I don't particularly like hand-optimizing leading attribute accesses because attribute accesses might just as well happen later

i am actually wondering why?

leading attribute accesses are a very common pattern (e.g. sqlalchemy and web frameworks) that may benefit significantly from ‘special’ (but not magic) treatment, e.g flask+sqlalchemy uses db.session.query(...), the pypi examples above use request.db.query(...).

any later attribute accesses, like .column_name at the end (or anything in the middle) would not get the ‘special’ treatment. that sounds like a very explicit, deterministic, and pragmatic approach.

graingert commented 6 years ago

Special cases like leading attributes don't sound very blackonic

On Tue, 15 May 2018, 17:14 wouter bolsterlee, notifications@github.com wrote:

I don't particularly like hand-optimizing leading attribute accesses because attribute accesses might just as well happen later

i am actually wondering why?

leading attribute accesses are a very common pattern (e.g. sqlalchemy and web frameworks) that may benefit significantly from ‘special’ (but not magic) treatment, e.g flask+sqlalchemy uses db.session.query(...), the pypi examples above use request.db.query(...).

any later attribute accesses, like .column_name at the end (or anything in the middle) would not get the ‘special’ treatment. that sounds like a very explicit, deterministic, and pragmatic approach.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ambv/black/issues/67#issuecomment-389225442, or mute the thread https://github.com/notifications/unsubscribe-auth/AAZQTNDRYfB75n62qwYoLOHod8WviKFjks5tyv72gaJpZM4S4SXC .

wbolster commented 6 years ago

sorry, but that really sounds like dodging the question instead of answering it. "not blackonic" is like saying "not pythonic", but without reasoning that is a meaningless statement. i tried to argue for a sensible, explicit, and pragmatic (all "pythonic" qualities) treatment for fluent apis, which, by their very nature have some "special" shape. this is not a new thing, sqlalchemy is widely used for instance, and patterns and best practices have been established by now.

if black blindly refuses to acknowledge that on "ideological" grounds without any pragmatism, i think that is a big missed opportunity for the black formatter tool. i would love to see black format fluent apis in a predictable and noncontroversial way, with respect for established practices.

ambv commented 6 years ago

I'm getting pretty close to getting this in. It's implemented as my comment above specifies.

@wbolster, I am sympathetic to your desired formatting but it boils down to the fact that you are expecting Black to understand what the code it is formatting does. In one example you're suggesting:

release = (
    request.db
    .query(Release)
    .filter(Release.project == project)
    .order_by(
        Release.is_prerelease.nullslast(),
        Release._pypi_ordering.desc(),
    )
    .limit(1)
    .one()
)

whereas in another you're suggesting:

rows = (
    db.query(...)
    .filter()
    .all()
)

Black can't know whether it makes sense to a human to hug leading attributes (and maybe a call?) and when it doesn't. If this ends up being a common complaint in the future, we will revisit this again.

dstufft commented 6 years ago

For whatever it's worth, while I have preferences, I think either option is good enough that I'm happy enough with either.

wbolster commented 6 years ago

what @dstufft said. i suggested both because both are valid choices, and i do not have a strong preference myself.

anything except this monstrosity, basically: :wink:

x = (
    db
    .session
    .query(...)
    .all()
)
jarshwah commented 6 years ago

hah, I would be ok with that version too @wbolster :) but any of the 3 proposed formats I'd be fine with.

@ambv which one are you going with out of interest? I wasn't able to determine which from comments above.

dstufft commented 6 years ago

Yea to be clear, I'm fine with that too, I'd probably end up writing my code slightly different, but that's about all.

ambv commented 6 years ago

FYI when I actually went and implemented it, it turned out that singling out attributes always looks weird. So the new rule is: fluent interfaces only break calls and indexing. Example from the tests:

def example(session):
    result = (
        session.query(models.Customer.id)
        .filter(
            models.Customer.account_id == account_id,
            models.Customer.email == email_address,
        )
        .order_by(models.Customer.id.asc())
        .all()
    )

I think this is what everybody wanted. Note that the first call is hugged to the attribute but this is necessary for all those cases where you just have one call but multiple attributes along the way. Consider:

    traceback.TracebackException.from_exception(
        exc,
        limit=limit,
        lookup_lines=lookup_lines,
        capture_locals=capture_locals,
        # copy the set of _seen exceptions so that duplicates
        # shared between sub-exceptions are not omitted
        _seen=set(_seen),
    )

It would look very weird to break up the call from the attributes.

dstufft commented 6 years ago

That looks excellent :)

wbolster commented 6 years ago

thanks @ambv!

whardier commented 4 years ago

Be cool if black could respect parens when only one dot is present (not minimum of two). I might be missing something in the docs. I respect the non-compromising aspect of the project - I just feel like I'm missing something.

randolf-scholz commented 1 year ago

Imo, one needs to rethink the hugging of the first attribute. For example out of this

observations = (
    measurements_experiments  # original dataframe
    .join(measurements_bioreactor, on=["measurement_time"])  # add bioreactor data
    .join(setpoints["InducerConcentration"].dropna(), how="outer")  # add setpoints
)

black makes this ugly thing:

observations = measurements_experiments.join(  # original dataframe
    measurements_bioreactor, on=["measurement_time"]
).join(  # add bioreactor data
    setpoints["InducerConcentration"].dropna(), how="outer"
)  # add setpoints

I think it would make sense to special case:

  1. If there is a single .method(..) or .attr[...] call: hug it
  2. If there are more than one .method(..) or .attr[...] call: multi-line it

This way, the example from https://github.com/psf/black/issues/67#issuecomment-389681164 would become

def example(session, models):
    result = (
        session
        .query(models.Customer.id)
        .filter(
            models.Customer.account_id == account_id,
            models.Customer.email == email_address,
        )
        .order_by(models.Customer.id.asc())
        .all()
    )

Traceback would stay the same:

traceback.TracebackException.from_exception(
    exc,
    limit=limit,
    lookup_lines=lookup_lines,
    capture_locals=capture_locals,
    # copy the set of _seen exceptions so that duplicates
    # shared between sub-exceptions are not omitted
    _seen=set(_seen),
)
YodaEmbedding commented 1 year ago
traceback.TracebackException.from_exception(...)

The TracebackException does not have any () attached to it, so it's reasonable to treat it as a part of traceback..

This can be done by having different rules for attributes without a call [such as .TracebackException], and attributes with a call [such as .from_exception()].

eguiraud commented 1 year ago

Hello, I was surprised to find that this code:

obj = obj.filter("oooooooooooooooooooooooooooooooooooo").filter("oooooooooooooooooooooooooooooo")

is formatted to the awkward:

obj = obj.filter("oooooooooooooooooooooooooooooooooooo").filter(
    "oooooooooooooooooooooooooooooo"
)

while if I add a third chained call:

obj = obj.filter("oooooooooooooooooooooooooooooooooooo").filter("oooooooooooooooooooooooooooooo").filter("oooooooooooooooooooooooooooooooooo")

black makes it pretty:

obj = (
    obj.filter("oooooooooooooooooooooooooooooooooooo")
    .filter("oooooooooooooooooooooooooooooo")
    .filter("oooooooooooooooooooooooooooooooooo")
)

I could not find a way to convince black to reformat something like the first snippet to something like the last. What am I missing?

sfermigier commented 1 year ago

@eguiraud I guess it's the normal current behaviour.

I don't see any occurrence where the result you call "awkward" (because it mixes two diffent styles in the same statement) could be considered the correct answer, but I may missing something.

I would treat it as a feature request or a bug.

eguiraud commented 1 year ago

A previous comment seems to indicate that one needs at least two dots for black to treat the call chain as a fluent API, although the docs don't mention this.

With the example above I was kind of hoping to show that the current behavior when there is only one dot is not ideal. But ok, I should probably open a new issue at this point :)