Open romintomasetti opened 1 year ago
This will replace any line that is only spaces with an empty line:
shopt -s globstar
sed -i -e 's/^[[:space:]]*$//g' packages/PACKAGE/**/*.cpp packages/PACKAGE/**/*.hpp
This will remove any trailing spaces on a line (including a line that is only spaces, note missing ^
in regex)
shopt -s globstar
sed -i -e 's/[[:space:]]*$//g' packages/PACKAGE/**/*.cpp packages/PACKAGE/**/*.hpp
e.g. for Tpetra:
@romintomasetti @cwpearson I have done some work related to this previously in MueLu with #11258 using a sed
command inside a find
command. I also opened #12058, and the MueLu team decided that it would be an uphill battle to try to maintain standards occasionally without also having some sort of Github action to enforce standards on a PR level.
Developing such a Github action is a work in progress still, but it's only one package. Enforcing it across all of Trilinos is a different story entirely and would require mass adoption from developers and leadership.
Hi @cwpearson and @GrahamBenHarper !
Thanks for sharing your thoughts.
I think there are 2 points w.r.t. trailing spaces:
Based on these 2 points, I would propose that a Github action is setup for banning new trailing spaces in any Trilinos package. See below for a simple script that does that.
This means:
@GrahamBenHarper Enforcing clang-tidy
at the Trilinos level would be easy, because you can opt-in for a few basic rules (for instance Trilinos could decide that only llvm-namespace-comment
must be enforced globally, see also https://clang.llvm.org/extra/clang-tidy/checks/list.html for a complete list of supported checks). I agree that enforcing clang-format
is a different story, because the checks are not opt-in and could lead to significant changes that sometimes aren't pretty. And, as I mentioned, significant stylistic changes of existing code without any new development should not be allowed in my opinion.
Here is a quickly drafted script. I used Python instead of any bash-based solution because of the additional flexibility to filter files you want to ignore for instance. I could run in python:latest
Docker image in which you install GitPython
. Use the script without any argument to look at all tracked files, or with --tracked-changed-only --commit-start upstream/develop --commit-end HEAD
to compare your local branch to whatever upstream branch you want.
import argparse
import logging
import os
import pathlib
import typing
import git
import typeguard
@typeguard.typechecked
def parse_args() -> argparse.Namespace:
"""Parse CLI arguments."""
parser = argparse.ArgumentParser()
parser.add_argument("--directory", help = "Directory wherein checks are performed (recursively)", required = False, default = os.getcwd())
group = parser.add_argument_group(title = 'Tracked changed only', description = "Only look at Git tracked files that changed between 2 commits.")
group.add_argument("--tracked-changed-only", action = "store_true", help = "Enable looking at tracked changed files only.")
group.add_argument("--commit-start", type = str, required = False, help = "If only tracked changed files must be considered, this is the start commit.")
group.add_argument("--commit-end" , type = str, required = False, help = "If only tracked changed files must be considered, this is the end commit.")
args = parser.parse_args()
if args.tracked_changed_only and (args.commit_start is None or args.commit_end is None):
parser.error("Start and end commits are required.")
return args
@typeguard.typechecked
def files_to_look_at(args : argparse.Namespace) -> typing.Iterator[pathlib.Path]:
"""
Generator that either:
- only keeps files that are tracked and modified between 2 Git commits are kept.
- keeps all tracked files
"""
logging.info(f"Listing tracked files to look at based on {args}.")
repo = git.Repo(path = args.directory)
if args.tracked_changed_only:
commit_start = repo.commit(args.commit_start)
commit_end = repo.commit(args.commit_end)
changed_files = repo.git.diff('--name-only', commit_start, commit_end).splitlines()
logging.debug(
"Filtering any file matching generator with files changed between "
+ f"'{args.commit_start}' ({commit_start}) and '{args.commit_end}' ({commit_end}):\n"
+ str(changed_files)
)
for changed in changed_files:
yield pathlib.Path(changed).absolute()
else:
for tracked in repo.commit().tree.traverse():
path = args.directory / pathlib.Path(tracked.path)
if path.is_file():
yield path
@typeguard.typechecked
def filter_what_you_want_to_ignore(files : typing.Iterable[pathlib.Path]) -> typing.Iterator[pathlib.Path]:
"""
Add any filtering rule.
"""
logging.info("Filtering to keep only files we want.")
# As an example, let's filter any PNG or YAML file.
for file in files:
if not any(file.suffix == suffix for suffix in ['.png', '.yaml']):
yield file
@typeguard.typechecked
def check_trailing_space(files : typing.Iterable[pathlib.Path]) -> None:
"""
Check for any trailing space in provided files.
"""
logging.info(f"Checking for trailing spaces.")
has_error = False
for file in files:
with open(file, 'r') as f:
for counter, line in enumerate(f):
if line.endswith(' \n'):
has_error = True
logging.error(f"Trailing space at {file}:{counter+1}")
if has_error:
raise RuntimeError("At least one trailing space detected.")
if __name__ == "__main__":
logging.basicConfig(level = logging.INFO)
args = parse_args()
check_trailing_space(files = filter_what_you_want_to_ignore(files = files_to_look_at(args = args)))
I would like to add this topic to the TUG's Developers Day Open Discussion, and get additional input and see if we can get consensus.
@ccober6 It seems the topic was not addressed during TUG's Open Discussion session... What's the way forward? :smile:
@romintomasetti We will put together a clang-format approach at the MueLu level (github action, pre-commit hook, and some type of bot). Once we feel like we have a reasonable approach other packages can opt in. We will have one massive reformatting commit that can be ignored in git blame.
This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity.
If you would like to keep this issue open please add a comment and/or remove the MARKED_FOR_CLOSURE
label.
If this issue should be kept open even with no activity beyond the time limits you can add the label DO_NOT_AUTOCLOSE
.
If it is ok for this issue to be closed, feel free to go ahead and close it. Please do not add any comments or change any labels or otherwise touch this issue unless your intention is to reset the inactivity counter for an additional year.
Enhancement
Trilinos
is full of:On today's develop, running the Python script attached (see below) leads to:
That is, each time someone makes a shallow clone of
Trilinos
(thinking of the CI mostly), it gets almost 2 megabytes of...spaces (I don't know what happens for the 480 files for which I got aUnicodeDecodeError
and I guess it deserves a new issue on its own).This:
Actions:
commonTools/test/utilities/check-mpi-comm-world-usage.py
(@JacobDomagala) that ensures no new trailing space is added.Trilinos
packages, removes lines whose content is only spaces.201760
characters (I guess a bit less than that, see next bullet point).Google Test
,Kokkos
and others)Trilinos
packages so it's worth a try.To quote someone that phrased it much better then me
Last pro: useless tokens clearly have an (slight) impact on the compiler's required time to parse
Trilinos
source code. Removing the trailing spaces might induce very tiny gains with that respect.trailing_white_space.py.txt