scipy / scipy

SciPy library main repository
https://scipy.org
BSD 3-Clause "New" or "Revised" License
13.16k stars 5.21k forks source link

STY: Maths formatting #14354

Closed tupui closed 3 years ago

tupui commented 3 years ago

This issue is linked to #14330

To be able to use tools (like, but not limited to Black), we need to define how we, as the scientific community and not just SciPy, want mathematical equations to be rendered.

The goal of this issue is to document and establish a strict set of rules to write maths. The rules must be coherent, extensive and opinionated (one way to do something, unambiguous wording) so they can be integrated in a tool (that tool may be Black).

I think such a document is missing from the scientific community and my hope is that we can all agree on something :smiley:

To quickstart things here are some ideas:

Formatting Mathematical Expressions

To format mathematical expressions, the following rules must be followed. These rules respect and complement the PEP8 (relevant sections includes id20and id28)

# Correct:
i = i + 1
submitted += 1
x = x*2 - 1
hypot2 = x*x + y*y
c = (a+b) * (a-b)
dfdx = sign*(-2*x + 2*y + 2)
result = 2 * x**2 + 3 * x**(2/3)
y = 4*x**2 + 2*x + 1
c_i1j = (1./n**2.
         * np.prod(0.5*(2.+abs(z_ij[i1, :])
                        + abs(z_ij) - abs(z_ij[i1, :]-z_ij)), axis=1))
# Wrong:
i=i+1
submitted +=1
x = x * 2 - 1
hypot2 = x * x + y * y
c = (a + b) * (a - b)
dfdx = sign * (-2 * x + 2 * y + 2)
result = 2 * x ** 2 + 3 * x ** (2 / 3)
y = 4 * x ** 2 + 2 * x + 1
c_i1j = (1.
         / n ** 2.
         * np.prod(0.5 * (2. + abs(z_ij[i1, :])
                          + abs(z_ij) - abs(z_ij[i1, :] - z_ij)), axis=1))
rkern commented 3 years ago

I am -1 on any such attempt to enforce such strict, extensive, and opinionated rules. PEP8's recommendations are the right level for developer guidelines, IMO. I'm not sure that such algorithmically-complete rules exist that are simultaneously both terse enough to be implementable and also don't create unreadable horrors in specific circumstances.

Now, if you want to develop an auto-formatting algorithm that uses whitespace in mathematical expressions more readably than black, that's great! Develop it somewhere and see if people like it. I might even use it if it's opt-in, especially if I can use it through my editor over the current selection of lines, not the whole file.

adeak commented 3 years ago

@tupui I think the +/- part would benefit from "unary"/"binary" terminology.

And can you explain what you mean by

(ii) the operator is used in a group to mark higher priority.

(in the same place)?

And why in

         * np.prod(0.5*(2.+abs(z_ij[i1, :])

there is no whitespace around the binary plus?

And is using 1., 2. a conscious choice or just force of habit? If this style guide takes off, taking a stance on the likes of 1. and .1 might be necessary.

rkern commented 3 years ago

That's an example of a (possibly-useful) ambiguity that can be used to make things more readable insofar as they communicate some subtle high-level semantics. That kind of ambiguity would be unavailable to algorithmic auto-formatters.

If the goal is to define rules that can be used to build an algorithmic auto-formatter, I recommend just going and implementing the algorithms and using the implementation as the object of discussion. Human brains just aren't good at predicting what the algorithm is going to do in all of the important cases just from the human-readable rules. Making a quick implementation gives us something concrete to discuss, we can throw real examples at it in bulk and evaluate the results quickly, and the process of implementation will make manifest all of the ambiguities.

newville commented 3 years ago

As an outsider in scipy-dev, I resisted commenting on #14330. Here, however, you seem to be aiming to codify how the wider scientific community (probably restricted to "scientific python"?) writes math.

The goal of this issue is to document and establish a strict set of rules to write maths.

Why is a strict set of rules beyond "valid Python" necessary? The gaol appears to be aimed at reformatting working Python code that someone wrote, and quite likely someone else reviewed or has read, and likely someone else else modified. Scipy has a lot of contributors - lots of people have read the code. The clarity of the math cannot have been too bad or objectionable. If there isolated cases where it needs fixing, I'm pretty sure you do not need an "established strict set of rules" to clean up the code.

The rules must be coherent, extensive and opinionated (one way to do something, unambiguous wording)

The Zen of Python uses "should" and "obvious" when talking about "one way to do something". It does not mandate that there can be only one way to do something.

so they can be integrated in a tool (that tool may be Black).

Why would you want to do that? A key feature of Python is that the code is readable, and hard to make impenetrable to a reasonably knowledgable person. Never mind which "strict set of rules" is needed, why is any strict set of rules needed? Why is any code re-formatting tool needed?

When writing Python code with even a modest amount of care, you can be pretty sure that someone else (maybe yourself in 2 years) will be able to read and (at least sort of) understand what it tries to do. This notion that whitespace between operators or mixing of single and double quotes in a codebase will somehow cause cognitive dissonance or start formatting arguments is somewhat hard for me to even comprehend. Do such things actually happen, ever with Python? There were tabs/spaces arguments, are there "1 or 0 whitespaces around '+'" arguments? Are people confused by single quotes?

Is there evidence that code formatting is a problem? What fraction of scipy, numpy, scikit-xxx PRs have had significant discussions (let alone "controversies") about Python formatting? How many of those are not resolved by "let's be sensible and mostly follow PEP8 when we can"?

I must say that when I first heard of Black I thought it might have been an elaborate hoax. It appears to misread the intent of the namesake quote about automobiles: At the time, there was one choice of color, and the question was whether to expand that choice. The quote was expressing: "don't focus on styling, focus on features and performance". Instead, Black asserts that variation in the formatting of working, valid Python code is a problem that needs fixing, focusing attention on the styling of already highly readable and working code at the expense of features and performance. It creates a problem where none existed.

The intention of Black is that PRs to fix bugs or add features will be held and more work demanded of the contributor in order to meet styling rules. The feature might be accepted, but only if it fits the style. Discrepancies end not in argument but in acquiescence (or perhaps in disgust -- pay attention to the ones who walk away). The intention of Black is that acquiescence ends any debate (was there any?). It enforces uniformity without exception or nuance, expelling non-compliant contributions when necessary. Many of us in the sciences are trying to fully internalize notions of belonging, access, equity, diversity, and inclusion. Would formatting of code submitted by the visually impaired be disadvantaged by these rules? Would it make screen readers more accurate? How does Black improve the community? The approach taken by Black is deliberately and proudly polarizing, basically for the sake of being proud about being polarizing. Let's have a little less of that, please.

If one wanted to follow the engineering wisdom from the Ford quote, they would be careful about formatting new code, try to be consistent and readable, but certainly not fix what ain't broke, and focus on features and performance over styling. They would be sensible. They probably would not even engage in this conversation. My apologies for not being strong enough to hold my tongue.

To format mathematical expressions, the following rules must be followed. These rules respect and complement the PEP8 (relevant sections includes id20and id28)

PEP8 is a guide, not a mandate. It says

"If operators with different priorities are used, consider adding whitespace around the operators with the lowest priority(ies). Use your own judgment; however, never use more than one space, and always have the same amount of whitespace on both sides of a binary operator:"

Somehow this got turned into 4 mandatory rules (with one exception!) about how much whitespace there will be around all binary operators. I did no read that as "Use your own judgment as long as you agree with me".

I think my main objection to this comes down to that line that reads "#Wrong" there, just above all the working code. That is code that is "not-PEP8 compliant", it is not "Wrong". The calculated values will not change. Is anyone confused by this code? If you're in there working on or reviewing the code and want to make it a bit more PEP8-ish, sure go ahead. If it looks readable, it is readable. If you decide that whitespace around a '+' sign isn't needed in something like np.sin(array[2*i+1, :]), well, maybe that's OK sometimes.

Sorry for the length.

ilayn commented 3 years ago

Thanks @newville that's an opinionated but balanced take. I am probably the last person to defend Black, but I think you have taken its use and the problem it promises to solve a bit differently than intended. What black offers is a non-negotiable set of code standards. This becomes particularly effective when many coders have to touch the same codebase frequently. Some come from Java-like background who don't mind going off to the second screen horizontally and others coming from different backgrounds working on the same Python code.

The amount of time wasted on code reviews on that regard in terms of business hours is immense where one says I don't like pep8 line length the others say other things etc. Here the black use is pretty much justified since instead of bringing your developers to a common understanding, the team delegates the code structuring to Black and agrees to not discuss it. Then everything is Black'ened and whatever comes out of it hopefully makes sense. And quite often it does the sensible thing. Code reviews get saner (as much as it can, I mean we are talking about devs here). Now what we have accomplished is that we have adopted the standard of the core devs of Black and we are done.

However, as many people quickly found out in the past (including us, at my company, after using it about 4 months) is that this standard is not written by scientific or number crunching people. And its strict standard often does not go together with say numpy standard or pandas .function(args).function(args) chains. That's a typical complaint and I think it is justified. So I'm not a fan of Black in that regard since it makes arrays wonky and uglier (IMO).

The discussion here is whether we should delegate code formatting to Black and be done with it. However, its choices as mentioned above especially about ** and operator precedence is almost always wrong for human eyes. For example, in the correct formatting above, instead I would have written it as

c_i1j = (1./(n ** 2.) * np.prod(0.5 * (2. + abs(z_ij[i1, :])
         + abs(z_ij) - abs(z_ij[i1, :] - z_ij)), axis=1))

because it is obvious that it would be a long line hence at least try to make it obvious by breaking it at a sensible point and not bring in strange staircase formatting. And Black starts to fail more often than not in the unary ops in terms of readability. But in any case you can see what was and is now in scikit-learn https://github.com/scikit-learn/scikit-learn/pull/18948 conversion.

Some lines are clearly disgusting in the "after" state to see but I tend to like the the relaxed 88 character line length since 79 is a bit too much in terms horizontal space constraint in my opinion but we won't need black to have that kind of relaxation.

tupui commented 3 years ago

Thank you @newville for expressing your sentiment on this.

As @ilayn pointed out, the goal is to save everyone endless discussions about styling.

Sure the PEP8 was written as a guide and even starts with A Foolish Consistency is the Hobgoblin of Little Minds. Still, over the decades we have seen that this guide was used as an authoritative way to write Python. And we could argue that it served the Python community well in general.

Having a common way of writing things across projects has an under rated value. Here I am not advocating to change the face of maths in all Python scripts used in science. I am more asking to reflect on how to write maths in large libraries such as SciPy and NumPy. The difference is paramount. For the developers having to navigate across these different projects, I think there is a great value having common practices. It enables faster onboarding of new contributors, removes lots of churn. Of course, long time contributors might not agree as they have years of experience navigating around these and other projects.

Newcomers, students, and as you rightfully noted, people with disabilities, would greatly benefit from a common ground. Having a unified language help lower the barriers and tools can be written to help them. Imagine if Black (or anything else) was used by every single project, you could more easily design tools that could read and write code for visually impaired people. Plus, things like that can/should be linked to pre-commit hooks. So no matter what you do, when you code will appear in the PR it will have the expected style without you having to do anything.

Yes, it removes the developer own style, sensitivity. But I will argue that we should not be able to see its mark in such large open source project. As developers, we read code all day long, and having to do this contextual change is not free, it can also lead to misreads, bugs. We certainly do not want to have a different developer style for every single files. You can make the parallel with standards in industry or rules in our society. It's not because the big think that everyone is depending upon is very strict on some aspects, that you have to do the same for your own project and are not free anymore.

Lastly, I would also note that we currently have tons of hard rules which involve so much more thinking and manual actions. Things like input validation, proper way to test, documentation, CI, etc. Here we are mostly talking about spacing that a machine would do for us so we don't have to talk about it.

rkern commented 3 years ago

the goal is to save everyone endless discussions about styling.

AFAICT, we largely do not have endless discussions about styling in the actual code reviews. We only have endless discussions about styling when someone proposes to use black.

Newcomers, students, and as you rightfully noted, people with disabilities, would greatly benefit from a common ground.

Citation needed. I have seen no evidence that the level of formatting that black and company provide any measurable benefits in this regard.

I've laid down this marker before, and I think it satisfies all of the evidenced benefits that you want from black: my ideal auto-formatting tool is one that leaves style-conforming code alone and only fixes up code that deviates. Somehow the benefits of some kind of auto-formatter got conflated with requiring a canonicalizing auto-formatter. black is not the only possible solution.

At minimum, a tool like darker can be fruitfully used by contributors to apply auto-formatting just to their contribution. All of the benefits with respect to the easing of writing code apply just as much to darker as to black. I recommend that you implement your preferred math styling rules in a way that can be plugged into that, and we can evaluate the results concretely rather than spinning out more endless discussions about styling in the abstract.

rkern commented 3 years ago

my ideal auto-formatting tool is one that leaves style-conforming code alone and only fixes up code that deviates.

It looks like autopep8 might fit that bill.

rgommers commented 3 years ago

Let me jump in here, since apparently there's two things being mixed:

  1. do we want/need a code formatter like black?
  2. is it possible to code up with consistent guidelines for writing math?

This issue is not about 1, only about 2. No change to any SciPy way of working is proposed

What black does today for math is bad, really bad. Something like hypot2 = 2 * x + 3 * y ** 2 is code no numerical Python person would write by hand. PEP 8 also falls well short here, it for example is completely silent on the power operator. So the question is: is it possible to do better than black and PEP 8. I'm pretty the answer is yes, the question is just how much better. Once we have the answer, at least there's something to point tool authors to. Maybe black et al. can implement it, maybe not. If they did, it would be helpful.

rkern commented 3 years ago

This issue is linked to #14330

Yes, this is about 1. A change to the SciPy way of working has been proposed. This particular issue is a sub-discussion in response to a specific objection to that proposal.

I agree that there are useful attempts at improving auto-formatting for mathematical expressions. I don't think it's all that helpful to try to hash them out here if we're foreclosing the idea of changing the SciPy way of working. Just go implement it somewhere and ask for feedback on the mailing list.

ev-br commented 3 years ago

FWIW, I circulated a project idea among students locally. Will see if anything comes out of this. No definite ETA at the moment, but we'll definitely report back if anything worth comes out if this.

вт, 6 июл. 2021 г., 17:06 Robert Kern @.***>:

This issue is linked to #14330 https://github.com/scipy/scipy/pull/14330

Yes, this is about 1. A change to the SciPy way of working has been proposed. This particular issue is a sub-discussion in response to a specific objection to that proposal.

I agree that there are useful attempts at improving auto-formatting for mathematical expressions. I don't think it's all that helpful to try to hash them out here if we're foreclosing the idea of changing the SciPy way of working. Just go implement it somewhere and ask for feedback on the mailing list.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scipy/scipy/issues/14354#issuecomment-874791531, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQI6SCBCVV3WWL3UHFFLXDTWMEXTANCNFSM473GMY5A .

tupui commented 3 years ago

This issue is linked to #14330

Yes, this is about 1. A change to the SciPy way of working has been proposed. This particular issue is a sub-discussion in response to a specific objection to that proposal.

Not it is not at 1, Ralf is correct. I am sorry if I mislead you with linking to this issue. But read what I wrote in the description and the further reply. I am only asking to write some guidelines for mathematical equations. It had some conditional, I just added a bit more.

I agree that there are useful attempts at improving auto-formatting for mathematical expressions. I don't think it's all that helpful to try to hash them out here if we're foreclosing the idea of changing the SciPy way of working. Just go implement it somewhere and ask for feedback on the mailing list.

Sorry but I do not have the expertise (and time. I already followed this path a few times after you suggested this and I just lost time here...) to implement all the ideas that I have. And in this case, there might be existing tools doing the job and they might just need some directions. This is the goal of this issue, to agree on how we should write maths. Then whatever we do with this document is extra. It can start as a general PEP8-ish on our contributing guide up to something used by auto formatting tools.

rkern commented 3 years ago

The description is very focused on defining rules for black-like tools, not only guidelines. If that is no longer the intent, you may wish to amend the wording (preserving the old version for reference, of course).

To be able to use tools (like, but not limited to Black), a strict set of rules to write maths. The rules must be coherent, extensive and opinionated (one way to do something, unambiguous wording) so they can be integrated in a tool (that tool may be Black).

Those are worthwhile goals because the state of those tools is pretty pathetic for math expressions. But to formulate rules that work within the constraints of algorithmic auto-formatters, you really need to work from code first. The problem that these auto-formatters face is much more constrained than just a human-readable style guide that we can add to our contributor docs. It seems like these ought to be synergistic goals, that making progress on one will gain you progress on the other whichever order you do them, but I think the similarities are deceptive; there are conflicts due to the different constraints on who/what is performing the style recommendations. So there are two tracks that you can go down: build an auto-formatter that produces output that you like, and writing human-level style guides.

If you want to make progress on the former, I think you have to start with code. There's no benefit to having long discussions here on scipy/scipy about it. Just go build it and ask for feedback from our community on the mailing list. Until you are proposing that scipy use that tool, it's not really on-topic here on the issue tracker.

If you want to make progress on the latter, that's definitely on-topic here, but I think you need to relax the "extensive and opinionated so they can be integrated in a tool" constraints.

newville commented 3 years ago

@tupui @rgommers

Let me jump in here, since apparently there's two things being mixed: 1 do we want/need a code formatter like black? 2 is it possible to code up with consistent guidelines for writing math? This issue is not about 1, only about 2. No change to any SciPy way of working is proposed

@rgommers Um, yes it is. And not only for SciPy but "we, as the scientific community and not just SciPy". The goal is very clearly stated as defining how math is rendered so that tools like Black may be used. It is not isolated to point 2.

This issue is linked to #14330

Yes, this is about 1. A change to the SciPy way of working has been proposed. This particular issue is a sub-discussion in response to a specific objection to that proposal.

Not it is not at 1, Ralf is correct. I am sorry if I mislead you with linking to this issue. But read what I wrote in the description and the further reply. I am only asking to write some guidelines for mathematical equations. It had some conditional, I just added a bit more.

Huh? The message sent to the mailing list on 1 July (https://mail.python.org/pipermail/scipy-dev/2021-July/024924.html) has a subject line of "[SciPy-Dev] Code formatting: black".

Issue #14330 is titled "MAINT/STY: use Black formatting" and opens with "I propose to apply Black on our code base."

This issue begins:

This issue is linked to #14330

To be able to use tools (like, but not limited to Black), we need to define how we, as the scientific community and not just SciPy, want mathematical equations to be rendered.

Now you both say that this is not about using Black to reformat Scipy or other scientific code and apologize if some of the links might mistakenly lead someone to conclude that?

I think that if you are concerned about consistency, there may be somewhere closer to home that may need more attention than code formatting.

rgommers commented 3 years ago

So clearly the issue description wasn't as focused as it should have been. @tupui discussed this with me, that's why I knew the goal was (2). I even pre-read what he wrote (but not thoroughly enough), so I'm partly to blame for it being unclear.

From initial discussion on gh-14330 it's clear that many people are -1 on using black because its math formatting is terrible. So that's on hold / rejected, unless math formatting can be fixed. There's no point continuing that discussion. Clearly using black is blocked. I'm happy to comment on the PR saying exactly that. We can also just close that PR.

By the way, I have no clear preference about any of this. I've only ever used black once, and it wasn't ideal. I'm happy to give it a chance though, if and only if all blocking concerns have been resolved.

rkern commented 3 years ago

Since this issue has obviously derailed, I suggest also closing this one and starting fresh. It's cleaner than trying to amend the initiating issue description and resuming an essentially new discussion mid-thread.

I still think it's questionable that the scipy/scipy issue tracker is the best place to have the amended discussion. Maybe the SPEC Discourse is a more appropriate venue?

rgommers commented 3 years ago

Since this issue has obviously derailed, I suggest also closing this one and starting fresh. It's cleaner than trying to amend the initiating issue description and resuming an essentially new discussion mid-thread.

Agreed, let's close it.

I still think it's questionable that the scipy/scipy issue tracker is the best place to have the amended discussion. Maybe the SPEC Discourse is a more appropriate venue?

That does sound like a good suggestion. We never had a place like that, but we do now - it'd be good to try and start using it. There's still little traffic on that Discourse, but we can point people to it on the mailing list.

tupui commented 3 years ago

Sounds like a good idea, agreed.

Thank you all for the discussion. In future I would hope we could have discussions with a productive outcome and less emotions.

ilayn commented 3 years ago

This would have been a good transfer to discussions had we had enabled it by the way