psf / black

The uncompromising Python code formatter
https://black.readthedocs.io/en/stable/
MIT License
38.21k stars 2.45k forks source link

Formatting Tabular Data with black - Special Formatting Rules for Code Blocks #3341

Open randolf-scholz opened 1 year ago

randolf-scholz commented 1 year ago

Is your feature request related to a problem? Please describe.

Sometimes I find myself needing small tables of data for configuration, for example a table containing units and upper and lower bounds for some pandas.DataFrame.

An example:

metadata = {  # formatted with black
    "Glucose": ["g/L", 0, 20],
    "DOT": ["%", 0, 100],
    "Volume": ["mL", 0, None],
}

I am deeply convinced that for this kind of manual data entries in source code, fixed width format is vastly superior in terms of readability in the vast majority of cases, especially if the tables get just a bit larger. (There are exceptions of course, such as when the size of entries differs a lot from row to row.)

Describe the solution you'd like

I think it would be nice if black allowed to turn on/off specific formatting rules for code blocks which ensures vertical alignment inside (potentially nested) literal lists/tuples/dictionaries. This form of formatting is used by literally all libraries that work with tables (pandas, numpy, etc.)

# fmt: on[table]
metadata = {
    "Glucose": ["g/L", 0, 20  ],
    "DOT":     ["%",   0, 100 ],
    "Volume":  ["mL",  0, None],
}
# fmt: off[table]

Describe alternatives you've considered

Currently, I am using # fmt: off and # fmt: on and manually format the tables, which is tedious.

felix-hilden commented 1 year ago

Hi! I get your concern, but having multiple formatting styles goes heavily against our ethos of non-configurability and consistency. So this is unlikely to be accepted.

randolf-scholz commented 1 year ago

@felix-hilden I would argue that this is the consistent choice. To me, it seems it's rather black that is the odd one out with how it formats tabular data. If non-configurability is the main concern, one could of course think about some auto-detection of tabular data (for example: literal list of lists of equal length) and only apply the special formatting rules in this case.

Non-configurability is a merit of black, but the question is how useful it really is when the default formatting lacks readability, as is the case for tabular data and it forces people to resort to fmt: off.

A quick GitHub search yields tens of thousands of results for fmt: off and indeed one of the very common cases is to use it when dealing with tabular data, some examples:

randolf-scholz commented 1 year ago

To some degree, this is issue is caused by the lack of a built-in literal for tabular data in python. Since python nowadays is used so much in the data science space, it seems that having some built-in formatting for tabular data is a good idea.

Looking at possible solutions, we have:

  1. Do nothing / resort to fmt: off
  2. Some autodetection for tabular data
    • would have to be very good / have very stringent rules for when it is applied, and might even require option to disable it in some cases
    • Example rule: If literal list[list] with constant length of the inner lists.
    • Adding an autodetection would disrupt existing code bases, and would have to be phased in over a long time
  3. Global configuration flag (goes against design philosophy)
  4. Local configuration via some sort of fmt: table flag (also kind of goes against design philosophy, but we already have some sort of local configurability via fmt: off)
felix-hilden commented 1 year ago

Thank you for being so thorough 🙏 I think this will explode in complexity if we want to do it well. Just to format your example we would have to:

So wrt.

  1. This would be my choice unfortunately
  2. Feels very hard and not worth it over consistent formatting everywhere. And an option to turn it off would feel odd. If we're doing it, then we should make it behave so well that there's no need to turn it off.
  3. You nailed it, it's a "no".
  4. Also a likely "no". It's quite a leap from "let's do nothing here" to "let's change our style here specifically". Feel free to introduce a formatter package that looks for # fmt: off/on, inverts them and restructures the tables inside though 😜

The only cases I would be theoretically willing to entertain are matrices of ints or floats. So exactly NxM numerical arrays. I'm not the most experienced here with our parsing logic, but this seems like a significant challenge there as well. But perhaps other maintainers disagree one way or the other.

Also, if you're finding Python awkward for tabular data you could store any nontrivial amount of data in sheets, tabular text files, CSVs etc.