psf / black

The uncompromising Python code formatter
https://black.readthedocs.io/en/stable/
MIT License
38.91k stars 2.45k forks source link

Single quotes option #118

Closed bofm closed 6 years ago

bofm commented 6 years ago

Hi! Although Black now prefers doubles, can we have an option to keep single quotes? Forcing double quotes would make this great project unusable for many users who picked the rule of using single quotes.

Operating system: MacOS Python version: 3.6 Black version: 18.4a0 Does also happen on master: yes

kadrach commented 6 years ago

That's true, but we do need to count e.g. requests individually. If we only use unique files, choices that requests has made will be discounted (given most of the "requests" files will be non-unique and ignored); and requests is likely to be among the top packages.

ambv commented 6 years ago

Create a dict where the key is the sha1 of the file content and the value is the file content. Download all packages first while building this dictionary. Later you can use black.lib2to3_parse() to get the CST of the content, subclass black.Visitor to pluck out all strings and check the opening few characters to classify the string.

It doesn't matter which package a file comes from. It's super unlikely for unrelated files to match hash-wise so you're not "discounting" requests, you're just counting it once. In fact, you'll count it more than once since some vendored libs will be outdated. But that's a way better approximation than your first attempt.

kadrach commented 6 years ago

I think I still need to retain a project to file association, to determine a "quote style choice" at project level.

ambv commented 6 years ago

No, that would unreasonably bump the value of a "style choice" for projects with small codebases compared to projects with many files. We don't care what a "project" chose, we care what all of the unique source code in the top 1,000 projects is using.

kadrach commented 6 years ago

Weighting by number of files would bump projects with many little files over projects that opted for a smaller number of larger files. That seems unreasonable. Perhaps by number of lines?

My original intent was to figure out what the "unspoken standard" choices of the topmost projects are, not what all of the unique source code is using - I'm not convinced of that measure.

landtuna commented 6 years ago

The biggest issue for me is that, for the people who use single quotes only as identifiers, running Black on a codebase actually loses information. It's not AST information, but it's information in the sense that a code comment is information.

ambv commented 6 years ago

For the people who use single quotes only as identifiers, running Black on a codebase actually loses information.

@landtuna, while I am skeptical if such scheme can be rolled out consistently, I acknowledge this is a problem.

kbd commented 6 years ago

The strongest argument in this thread for providing an option to not change quotes for strings (except for docstrings, which should always be """ per PEP 257) is that many people (including Guido) use quote style as a "extra-terse comment" to convey meaning, with single quotes meaning "data" and double quotes meaning "human-readable string".

I checked through my code and found I do the same thing without ever consciously deciding on that. I suspect this is common. I think this is actually why standardizing on double quotes bugged me (and perhaps other people?) so much and I couldn't explain why.

ambv commented 6 years ago

@kbd, agreed, this is the most convincing argument for allowing Black to optionally skip normalizing quotes as I suggested a few days back. That would cover all people unhappy with double quote enforcement.

jgirardet commented 6 years ago

you mean something like that ?

['my_key'] = "my string"

instead of

["my_key"] = "my string"
lig commented 6 years ago

@ambv I'm strongly voting for single quotes as approved code style. I've seen this at a lot of companies and discussed several times at different meetups. It is a widely spread practice to use single quotes for everything that is a code and double quotes for doc strings.

In fact, this is very useful in IDEs and editors where one could differentiate in-code strings and doc strings between each other via defining different coloring for them.

Honestly, I was a bit shocked after black converted all single quotes to double quotes resulting in much less readable code in an IDE.

kbd commented 6 years ago

@ambv What do you think about an enforced quote style of "strings containing no whitespace are single-quoted while strings with whitespace get double quotes"? That would be a consistent style (i.e. the same file would always have the same output) that may produce the smallest delta from how many people actually prefer to write Python.

I could definitely see adopting that enforced style, whereas I would personally never want to use forced double-quotes.

lig commented 6 years ago

@kbd sounds like something that is the best from different worlds. This should work for the case with ' for code, "" for docs and also ' for data, "" for human-readable values most of the time. A nice trick as for me.

harveyr commented 6 years ago

For another data point: My team is about to reformat all our code with black. We've always used single quotes. We've got some very experienced Python folks. Everyone is fine with it.

gaggle commented 6 years ago

@harveyr yeah us too.

I’m fine w. whatever, including a toggle-option that we as a community can possibly discuss and revisit in due time. Perhaps a toggle option really is the suitable solution to not fracture whatever community this project is accumulating.

Still not sure what makes us so different from StandardJS that does insist on a unified quote style 🤷‍♂️ But I also don’t subscribe to the “public strings as double-quotes” strategy, it’s much too fragile for my tastes (I’d push public strings through a translation function before using semantically meaningless quotes)

ambv commented 6 years ago

What do you think about an enforced quote style of "strings containing no whitespace are single-quoted while strings with whitespace get double quotes"?

@kbd, I find this curious but too magical :)

kbd commented 6 years ago

But I also don’t subscribe to the “public strings as double-quotes” strategy, it’s much too fragile for my tastes (I’d push public strings through a translation function before using semantically meaningless quotes)

The discussion hasn't been about "public" strings, but "human-readable" strings (i.e. natural language vs code). It's not necessarily about user-facing strings. For example, the vast majority of all (non-docstring) quotes in my code are single quotes because most strings in code are data (dictionary keys, names of things, shell values, filenames, etc.), but things like log messages and exception strings get double quotes. As far as "semantically meaningless" goes, many people in this thread have already explained the meaning they assign to different quotes.

I find this curious but too magical :)

Not arguing with you (since that's not an argument ;) but what if that small heuristic corresponds to what most people actually do and expect when reading Python code? What if in practice that results in the least disruption on existing codebases? What if it allows a fully-automated/consistent style that can be used by people who would otherwise reject double quotes everywhere? Would it be worth consideration?

Edit: I'd imagine this formatting would even catch bugs. "Why did this dictionary key / filename get double quotes? Oops there's a space".

ikatson commented 6 years ago

One more argument to make it more configurable than it is now, is that "git blame" becomes of less use for older commits, as the majority of the changes black does on our code is changing quote style from single to double

ambv commented 6 years ago

@ikatson, agreed. In general, I should add this to the README:

That is useful beyond the string quotes.

ambv commented 6 years ago

Resolution: 18.6b0 has --skip-string-normalization, or -S for short. This allows existing projects to adopt Black with less friction and works for all alternative string quotes policies.

Black still defaults to and recommends normalizing string quotes to double quotes everywhere as we believe this is a better default than single quotes, and is enforceable unlike the "single quotes for data, double quotes for human-readable strings" policy.

I hope this resolves to your satisfaction what's been the most controversial issue in Black's history.