Closed bofm closed 6 years ago
That's true, but we do need to count e.g. requests individually. If we only use unique files, choices that requests has made will be discounted (given most of the "requests" files will be non-unique and ignored); and requests is likely to be among the top packages.
Create a dict where the key is the sha1 of the file content and the value is the file content. Download all packages first while building this dictionary. Later you can use black.lib2to3_parse() to get the CST of the content, subclass black.Visitor to pluck out all strings and check the opening few characters to classify the string.
It doesn't matter which package a file comes from. It's super unlikely for unrelated files to match hash-wise so you're not "discounting" requests, you're just counting it once. In fact, you'll count it more than once since some vendored libs will be outdated. But that's a way better approximation than your first attempt.
I think I still need to retain a project to file association, to determine a "quote style choice" at project level.
No, that would unreasonably bump the value of a "style choice" for projects with small codebases compared to projects with many files. We don't care what a "project" chose, we care what all of the unique source code in the top 1,000 projects is using.
Weighting by number of files would bump projects with many little files over projects that opted for a smaller number of larger files. That seems unreasonable. Perhaps by number of lines?
My original intent was to figure out what the "unspoken standard" choices of the topmost projects are, not what all of the unique source code is using - I'm not convinced of that measure.
The biggest issue for me is that, for the people who use single quotes only as identifiers, running Black on a codebase actually loses information. It's not AST information, but it's information in the sense that a code comment is information.
For the people who use single quotes only as identifiers, running Black on a codebase actually loses information.
@landtuna, while I am skeptical if such scheme can be rolled out consistently, I acknowledge this is a problem.
The strongest argument in this thread for providing an option to not change quotes for strings (except for docstrings, which should always be """ per PEP 257) is that many people (including Guido) use quote style as a "extra-terse comment" to convey meaning, with single quotes meaning "data" and double quotes meaning "human-readable string".
I checked through my code and found I do the same thing without ever consciously deciding on that. I suspect this is common. I think this is actually why standardizing on double quotes bugged me (and perhaps other people?) so much and I couldn't explain why.
@kbd, agreed, this is the most convincing argument for allowing Black to optionally skip normalizing quotes as I suggested a few days back. That would cover all people unhappy with double quote enforcement.
you mean something like that ?
['my_key'] = "my string"
instead of
["my_key"] = "my string"
@ambv I'm strongly voting for single quotes as approved code style. I've seen this at a lot of companies and discussed several times at different meetups. It is a widely spread practice to use single quotes for everything that is a code and double quotes for doc strings.
In fact, this is very useful in IDEs and editors where one could differentiate in-code strings and doc strings between each other via defining different coloring for them.
Honestly, I was a bit shocked after black converted all single quotes to double quotes resulting in much less readable code in an IDE.
@ambv What do you think about an enforced quote style of "strings containing no whitespace are single-quoted while strings with whitespace get double quotes"? That would be a consistent style (i.e. the same file would always have the same output) that may produce the smallest delta from how many people actually prefer to write Python.
I could definitely see adopting that enforced style, whereas I would personally never want to use forced double-quotes.
@kbd sounds like something that is the best from different worlds. This should work for the case with ' for code, "" for docs
and also ' for data, "" for human-readable values
most of the time. A nice trick as for me.
For another data point: My team is about to reformat all our code with black. We've always used single quotes. We've got some very experienced Python folks. Everyone is fine with it.
@harveyr yeah us too.
I’m fine w. whatever, including a toggle-option that we as a community can possibly discuss and revisit in due time. Perhaps a toggle option really is the suitable solution to not fracture whatever community this project is accumulating.
Still not sure what makes us so different from StandardJS that does insist on a unified quote style 🤷♂️ But I also don’t subscribe to the “public strings as double-quotes” strategy, it’s much too fragile for my tastes (I’d push public strings through a translation function before using semantically meaningless quotes)
What do you think about an enforced quote style of "strings containing no whitespace are single-quoted while strings with whitespace get double quotes"?
@kbd, I find this curious but too magical :)
But I also don’t subscribe to the “public strings as double-quotes” strategy, it’s much too fragile for my tastes (I’d push public strings through a translation function before using semantically meaningless quotes)
The discussion hasn't been about "public" strings, but "human-readable" strings (i.e. natural language vs code). It's not necessarily about user-facing strings. For example, the vast majority of all (non-docstring) quotes in my code are single quotes because most strings in code are data (dictionary keys, names of things, shell values, filenames, etc.), but things like log messages and exception strings get double quotes. As far as "semantically meaningless" goes, many people in this thread have already explained the meaning they assign to different quotes.
I find this curious but too magical :)
Not arguing with you (since that's not an argument ;) but what if that small heuristic corresponds to what most people actually do and expect when reading Python code? What if in practice that results in the least disruption on existing codebases? What if it allows a fully-automated/consistent style that can be used by people who would otherwise reject double quotes everywhere? Would it be worth consideration?
Edit: I'd imagine this formatting would even catch bugs. "Why did this dictionary key / filename get double quotes? Oops there's a space".
One more argument to make it more configurable than it is now, is that "git blame" becomes of less use for older commits, as the majority of the changes black does on our code is changing quote style from single to double
@ikatson, agreed. In general, I should add this to the README:
git hyper-blame
and/or git blame $BLACK_REV^ -- $FILE
to skip over the formatting commitThat is useful beyond the string quotes.
Resolution: 18.6b0 has --skip-string-normalization, or -S for short. This allows existing projects to adopt Black with less friction and works for all alternative string quotes policies.
Black still defaults to and recommends normalizing string quotes to double quotes everywhere as we believe this is a better default than single quotes, and is enforceable unlike the "single quotes for data, double quotes for human-readable strings" policy.
I hope this resolves to your satisfaction what's been the most controversial issue in Black's history.
Hi! Although Black now prefers doubles, can we have an option to keep single quotes? Forcing double quotes would make this great project unusable for many users who picked the rule of using single quotes.
Operating system: MacOS Python version: 3.6 Black version: 18.4a0 Does also happen on master: yes