ucam-department-of-psychiatry / crate

Create and use de-identified research databases. Preprocess, extract text, anonymise/de-identify, link, apply natural language processing, query for research, manage consent for contact.
GNU General Public License v3.0
19 stars 7 forks source link

Code autoformatting? #76

Closed RudolfCardinal closed 2 years ago

RudolfCardinal commented 2 years ago

Do you think we should be using something like https://black.readthedocs.io/en/stable/index.html for autoformatting Python? I don't necessarily find all its hanging indent styles (e.g. for function definitions) the prettiest but nor do I make my own entirely consistent and I think this take-it-or-leave-it style might match your preferences even more than mine -- either way, it removes some thinking and I note Mike Bayer's strong endorsement! There's a playground to experiment with too. (It likes 88 character width so we might need to turn that down to 79 for the other linters.)

martinburchell commented 2 years ago

I was having the same thoughts last week. I don't want either of us to spend a lot of time formatting or reformatting code and having a third party tool to do the decision-making for seems good.

It seems that Black is becoming the de facto standard, which means that anyone new to this project may well be familiar with it. I don't like some of its choices but I could live with them (my opinion on curly bracket placement in other languages has changed over time). The only alternative I've found is yapf which is more configurable and would allow us to define our own code style if we wanted to. I have seen some reports that yapf will not format code consistently (ie it will reformat its own output).

So my leaning is towards trying Black. With the column limit, 88 characters was chosen to give the best results. We could change the other tools to use 88 characters or see if there was a middle ground that would keep all of the formatters happy.

Perhaps the next steps are to try reformatting the whole code base with Black (at some column limit), make sure it still works with flake8 and there aren't too many horrors. I'm assuming PyCharm has an option to autoformat with Black on save (I can make Emacs do that). We'd need to add black to the pre-commit hook and add it to the python-checks GitHub workflow in case the pre-commit hook got skipped.

We could also consider isort but maybe one thing at a time.

RudolfCardinal commented 2 years ago

I've come across https://www.youtube.com/watch?v=ZsHMHukIlJY&t=1028s too -- he makes reasonable points about the layout style. I might give Black a small-scale whirl with a couple of files I'm working on. I'm yet to be convinced of the merits of breaking PEP8 to go to 88 characters... Ubuntu supports sudo apt install black and then it works as a PyCharm plugin via /usr/bin/black --line-length 79 "$FilePath$".

RudolfCardinal commented 2 years ago

I like it... and 79 seems fine.

martinburchell commented 2 years ago

I like it... and 79 seems fine.

Great. Probably best to reformat the whole lot in one go, maybe after the API work and your NLP changes are merged.