Open cdce8p opened 3 years ago
What is HTML ?
@friendyou1
From a quick Google Search the summary from Wikipedia says the following:
The HyperText Markup Language, or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets and scripting languages such as JavaScript.
i.e. it's the language that defines webpages (to be precise their content and structure), this page you're reading is made up of HTML. HTML is often used with JS (for interactivity) and CSS (to style and design the page so it looks nice)
Finally, please avoid posting irrelevant comments, it's noisy on a busy issue tracker. Thank you!
If someone is interested, I've opened #2163 which would address it.
The linked PR had positive feedback from maintainers but was rejected because of the disruption it would cause. So should this issue be closed by extension, or is there still a discussion to be had about this suggestion?
Marc did ask if keeping user-inserted parentheses would be an option. In my opinion the style would indeed be an improvement. But is it inappropriate with respect to Black's philosophy or reasonable pragmatism?
I've created a followup PR #2237 that would keep user inserted optional parentheses
Based on my recent ride around the issue tracker, it is evident that this is a highly-requested feature. I'll summarise Łukasz' remarks when closing the initial PR:
Łukasz ended with saying he's also disappointed at the fact that Black is not internally consistent in this regard.
If I may, I'll try to push this closer to a resolution once more.
I agree that this is a suboptimal solution, since it is akin to the magic trailing comma and comes with all the similar problems.
I think this is not that important given Black's function. We ruthlessly splits lines, change quotations and add or remove other parentheses. Users "cede control over formatting" to Black, they shouldn't expect Black to improve their current desired (lack of) style.
This is the biggie. If we can agree on a good set of rules to actually do this, it could be tested on the primer projects to see exactly how much disruption it would cause. But getting too mathematical is not the point either.
There seems to be no question that this is the better style in the examples that were given:
if not is_node_in_type_annotation_context(node) and isinstance(
node.parent, astroid.Subscript
):
pass
# VS
if (
not is_node_in_type_annotation_context(node)
and isinstance(node.parent, astroid.Subscript)
):
pass
But it is not that obvious in more complex cases like in our own tests (example taken from the original PR test code modifications):
if prevp.parent and prevp.parent.type in {
syms.typedargslist,
syms.varargslist,
syms.parameters,
syms.arglist,
syms.argument,
}:
# VS (what cdce8p wrote)
if (
prevp.parent
and prevp.parent.type
in {
syms.typedargslist,
syms.varargslist,
syms.parameters,
syms.arglist,
syms.argument,
}
):
# VS (another possibility, which I kinda like)
if (
prevp.parent
and prevp.parent.type in {
syms.typedargslist,
syms.varargslist,
syms.parameters,
syms.arglist,
syms.argument,
}
):
But the problem is more general than if statements. I closed lots of similar issues above, but I feel like even #236 and #348 are closely related to this. So I think we should really come up with a careful set of rules, or better yet: examples of what parentheses should look like if we're going to continue pursuing this any further.
To that end, I'd like to ask from the maintainers and especially @ambv: has the ship sailed? Are you absolutely opposed to any changes of this sort? If yes, I don't think we should waste any more resources discussing it. But if not, I'd like to thoroughly investigate the possibilities and come up with something that could be worth doing.
Matching the way people write code
This is less about how people write code and more about how people read it. The need for a tool like Black tells use that the way people write code is actually inconsistent and messy. Users choose to cede control over formatting to Black because the readability of the resulting code is, in 99% of cases, as good or better than it was before. It seems evident that this problem case makes up a disproportionate part of that 1% of outputs which have subpar readability.
Is the proposed style better?
In my opinion, the resulting code in the given counter examples is not worse than it was before. If it turned out that this was representative of the worst case scenario, I don't think asking whether the proposed rules make improvements in all cases is really what we should focus on. Rather, we should evaluate whether the improvements which we already know about outweigh the costs of the disruption these changes would cause.
I think it's also worth noting that much of the discussion around this change happened before the new stability policy. The --future
flag could be a great opportunity to introduce this change and collect feedback without any real disruption. If it turns out from the feedback that this does more harm than good, we can just revert it before the next annual release.
I just add my voice in favor of this issue with the following example :
mnopqrstuvwxyza = (
set(stuvwxyz_abcdefghi[pqrstuvwxy].keys())
- set(stuvwxyz_abcdefghij[pqrstuvwxy].keys())
)
black currently suggests this modification that is sub-optimal for readability :
- mnopqrstuvwxyza = (
- set(stuvwxyz_abcdefghi[pqrstuvwxy].keys())
- - set(stuvwxyz_abcdefghij[pqrstuvwxy].keys())
+ mnopqrstuvwxyza = set(stuvwxyz_abcdefghi[pqrstuvwxy].keys()) - set(
+ stuvwxyz_abcdefghij[pqrstuvwxy].keys()
)
I did not check if the PR handles this case also, but I hope it is the case :)
Though I post a small update. Unfortunately, I don't have time to work on it anytime soon. If someone else is interested in picking this up, please feel free to do so!
Another case from #587: I'd expect binary operators to be formatted similarly to boolean operators.
A specific example, quoting from https://github.com/psf/black/issues/587#issuecomment-475581498:
I have a code snippet as follows:
looks_like_pep = ( file_path.startswith("pep-") and file_path.endswith((".txt", "rst") )
With line length 79,
black
re-formats this code to:looks_like_pep = file_path.startswith("pep-") and file_path.endswith( (".txt", "rst") )
This is intentional, there is no automated rule we can do that will behave better. If we always put organizing parentheses whenever there is only a single operator between two expressions, this would lead to things like
a = (
bbbbbb(1234567890)
+ cccccc(123456790)
)
instead of
a = bbbbbb(123456790) + cccccc(
123456790
)
This formatting also takes up fewer lines.
The rule now uses organizing parentheses when there's more than one operator involved. It's a very simple rule that is easy to explain. If we made deviations here, not only would we invite other edge cases, but we would also have trouble explaining what Black did in those suboptimal cases.
Therefore, I'm somewhat skeptical we can improve much here without making everything much more complicated.
Fair points by Łukasz! (For the time being I'm just organising the issues. Still discussions to be had.)
A somewhat tricky case from #1094 though: when having both binary and boolean operators, the order of operations should determine which is used to split first:
if str(
self.user.id
) != self.target_user_id and not CheckUserManagementViewPermission.user_has_permission(
self.user
):
pass
# becomes
if (
str(self.user.id) != self.target_user_id
and not CheckUserManagementViewPermission.user_has_permission(
self.user
)
):
pass
Personally, I think your newer example format is cleaner.
a = (
bbbbbb(1234567890)
+ cccccc(123456790)
)
means you only have complete pieces of logic on each line rather than breaking inconsistently part way through functions.
It's easier to read complete sentences when they're together, and it creates more legible diffs:
--- a/example.py
+++ b/example.py
@@ -1,4 +1,4 @@
a = (
bbbbbb(1234567890)
- + cccccc(123456790)
+ - dddddd(123456790)
)
rather than
--- a/example.py
+++ b/example.py
@@ -1,3 +1,3 @@
-a = bbbbbb(123456790) + cccccc(
+a = bbbbbb(123456790) - dddddd(
123456790
)
@felix-hilden it's confusing to cram both issues here. Let's focus here on the single-operator parentheses.
@xM8WVqaG Black will prefer fewer vertical lines over more, if it can be helped. Unfortunately, changing this will affect a lot of code which now formats okay and will format worse. It's a relatively easy change for you to make to your Black installation:
pip install --no-binary=:all: black
.python3 -c 'import black.lines; print(black.lines.__file__)'
. if bt.delimiter_count_with_priority(max_priority) > 1:
to say > 0
instead. Test with as much code as you can lay your hands on. You'll see some regressions in behavior.
One more thing to note here that you won't be aware of unless you actually contributed to Black. Black formats everything (with one exception that is irrelevant here) in incoming token order, which usually is "left to right". This produces regular and predictable formatting because it doesn't "look forward" to try to reformat things and then decide whether they are nice or not.
In turn, it won't know the difference between, say:
a = (
bbbbbb(123456790)
+ cccccc(
1234567890
+ 1234567890
)
)
instead of the much more reasonable
a = bbbbbb(123456790) + cccccc(
1234567890 + 1234567890
)
It's difficult since it would effect almost every project that uses black today. However, just from seeing how many related issues were created, it looks like many users would like an improvement (me included).
Just from the initial testing I've done with #2237, it might be worth limiting any changes to LOGIC_PRIORITY
or higher for now. There are still some corner cases with that, but much fewer than when including all operators.
While many people do report this, it's hard to know where the majority opinion lies just by looking at the number of issues. The people who are satisfied with the current behavior aren't posting issues, and thus, are not participating in this conversation. I'd personally appreciate some change here, but it's important to remember that there are a lot of black
users whose viewpoints are not being represented.
The "hierarchical first" line-split seems always better to me. If not chosen as the sensible default, it would be nice to have it as a configuration option.
I think whatever is decided here, we won't have a configuration option. We don't want to give any unnecessary control over formatting.
@ambv I'm sorry, but for me the 'counter-examples' that you show look better/are easier to read than what black is suggesting. Maybe there are different ways how different people read code. But especially in your two examples:
a = bbbbbb(123456790) + cccccc(
123456790
)
a = bbbbbb(123456790) + cccccc(
1234567890 + 1234567890
)
the closing parentheses are visually completely dis-aligned to the code which they enclose, because they are vertically and horizontally separated from their opening parenthesis.
It is not only that, if you need line breaks, one would prefer to have both objects of an operation horizontally aligned and having complete pieces of logic on one line, but also about the quick identification of matching parentheses and their enclosed content.
@felix-hilden here is a possible take on the example from tests
if prevp.parent and prevp.parent.type in {
syms.typedargslist,
syms.varargslist,
syms.parameters,
syms.arglist,
syms.argument,
}:
# start from scratch
if prevp.parent and prevp.parent.type in {syms.typedargslist, syms.varargslist, syms.parameters, syms.arglist, syms.argument,}:
# add parentheses
if (
prevp.parent
and prevp.parent.type in {syms.typedargslist, syms.varargslist, syms.parameters, syms.arglist, syms.argument,}
):
# still long, add more
if (
prevp.parent
and (
prevp.parent.type
in {syms.typedargslist, syms.varargslist, syms.parameters, syms.arglist, syms.argument,}
)
):
# still long
if (
prevp.parent
and (
prevp.parent.type
in {
syms.typedargslist,
syms.varargslist,
syms.parameters,
syms.arglist,
syms.argument,
}
)
):
# try to inline in reverse
if (
prevp.parent
and (
prevp.parent.type in {
syms.typedargslist,
syms.varargslist,
syms.parameters,
syms.arglist,
syms.argument,
}
)
):
@felix-hilden this is kinda similar to your last option. Frankly, it looks like it could be inlined once more and it will be exactly it. Anyway, putting and
and in
on the same level here isn't right IMO.
@lig I think you could continue to inline twice more back to the original desired style from the test. I do not think it's unreadable.
A recursive algo which inlines leaves as it returns from the call stack as long as the max line length is not violated is a good way to go about it. The point is to break the outer-most element if the line is too long. It's what my intuition is, and what I would write naturally if I cared. In reality I just let everything go to the right and let black clean it up 😁. I have even written a pretty printer that used that exact algo because pprint looks awful.
I would suggest (re)considering the use of a configuration option to balance the interests here. The magic trailing comma is a similar case - it's a syntactically benign way to signal formatting intent.
Some of the formatting examples above are so pathological (as are the cases in our code base - I omit them as the topic is well covered above) that I think it should be a priority to address them in one way or another.
In the case of boolean expressions, adding a trailing and True
is similar to the trailing comma - and cleans up the formatting but at the obvious expense.
I don't follow the argument that fewer lines of code trump readability.
Disruption to existing code is obviously a big deal. But if the implementation is to use grouping parenthesis as a cue for the new formatting rules, these would have to be explicitly added and existing code should not be impacted. Again, similar to the trailing comma. And if there's a config option - also similar to trailing comma - there's no impact at all or users could opt out (depending on the default behavior).
I would suggest (re)considering the use of a configuration option to balance the interests here.
We avoid that here.
I don't follow the argument that fewer lines of code trump readability.
They usually do because they allow more content to fit one screen or page. Black additionally prefers to format things in a way where lines read from left to right instead of pre-emptively exploding into one-element-per-line.
The unclear examples in this open issue are obviously controversial, but a ton of others aren't, and nobody lists them here because they just look good. The current rule requiring "more than one delimiter" to use optional parentheses was added because without it, the output was clearly overusing parentheses leading to explosion of newlines in unexpected places.
The problem with a tool that applies the same rules every time is that in some cases a given rule leads to output that is less obvious. It seems here most pushback stems from using square brackets or parentheses from a call to split a line.
In the case of organizing optional parentheses, there is no easy-to-express rule to make it work for everybody. Most of all, it's to a large extent subjective. What to some is "so pathological that it should be a priority", is not a big deal (or is indeed preferred) by others.
Using existing parentheses as a marker that they should be kept makes the tool less automatic. It will then happily add parentheses in some cases but when you modify the code, it won't take them out. That's already a source of confusion for users when it comes to trailing commas, and adding parentheses to the mix would make this problem worse.
Originally Black didn't automatically put or remove parentheses, it would just respect what it found already in the file. This was widely criticized, I received many requests to change it. This issue argues, in essence, to roll this change back, which would be an unpopular decision.
I would suggest (re)considering the use of a configuration option to balance the interests here.
We avoid that here.
Maybe these couple of cases are those good candidates for having options for them? AFAIR, skip-string-normalization
was kinda similarly controversial and it took Guido's opinion to make this into an option. These two are obviously controversial, have different logic already, and confuse both sides.
Just an extra little bump here. I stumbled on this issue because the logic trees in my application are getting worse and worse due to this formatting choice. If I had the option to cause black
to format these more nicely, I'd do it in a heartbeat.
if self._rule_has_same_enviroment_tag(
rule=rule, live_rule=live_rule
) and self._rule_has_matching_source(rule=rule, live_rule=live_rule):
return live_rule
# vs
if (
self._rule_has_same_enviroment_tag(rule=rule, live_rule=live_rule)
and self._rule_has_matching_source(rule=rule, live_rule=live_rule)
):
return live_rule
The first one is almost unreadable to me.
Very much needed.
Current behavior is sometimes so unreadable that I have to wrap code in #fmt: on/off
.
@JelleZijlstra invited a PR to address #3800: comments that get egregiously moved. The fix could potentially allow commenting to preserve optional parentheses. Not saying that's good or bad, just relevant to this discussion.
Putative "magic comment" in action:
a = (
bbbbbb(123456790) #
+ cccccc(123456790)
)
I'd like a real solution but at minimum I think this is a problem with the docs.
Similar to https://github.com/psf/black/issues/2156#issuecomment-1505408040 adding an extra or None
at least makes it readable.
_cfg_dir = _os.environ.get("RYO_USER_CONFIG_DIR") or _appdirs.user_config_dir(
"ryo-iso", "HxR"
)
_cfg_dir = (
_os.environ.get("RYO_USER_CONFIG_DIR")
or _appdirs.user_config_dir("ryo-iso", "HxR")
or None
)
Aside from --line-length 99
, I'm still not sure how to improve this sort of thing.
(self.base_image["distro"], self.base_image["version"]) = self.iso_cfg[
"image"
].split("/")
That's interesting that adding the third operand actually make thye formatting nice, but with two it's unreadable
return (
self.context.control_flow.is_conditional_jump(temp)
or self.context.is_used_temp(temp)
or None
)
return self.context.control_flow.is_conditional_jump(
temp
) or self.context.is_used_temp(temp)
Can the behavior be changed to respect two operands as well? It doesn't seem like it will actually negatively affect anything.
edit: eh, I guess it is fairly similar to some of the other examples in the end. i'll just use fmt off. :/
This is related to https://github.com/psf/black/issues/4123. However, I'm not sure if such a fundamental change is an option at this point.
Describe the style change
For short if statements (just one
and
/or
) black tries to get rid of optional parentheses whenever possible. One common result of that behavior is that the line is instead spilt on a function call which often reduces the readability compared to optional parentheses. Even if the optional parentheses are inserted manually, black will reformat the code.Examples in the current Black style with desired style
Some examples taken from the pylint code base.
Additional context For any statement with more at least two priority delimiters the above suggested syntax is already the default.
One possible implementation could be the modify these lines here and use
> 0
instead of> 1
. https://github.com/psf/black/blob/76e1602d60391cca1bb3b21a9ef6ed07a1bca17d/src/black/__init__.py#L6751-L6754