bamorim commented 4 years ago

Hi Folks, I want to share some thoughts that I've been having while playing around with this gem and implementing the cleanup command.

The reason for opening an issue here is to open space for discussion on these sensitive topics and then we should follow up by creating issues per thing we want to actually do.

DISCLAIMER: This is an attempt to do a mental dump of what is in my head, so I probably missed a lot of points. Also, I've been looking at this thing for less than one week, so I probably made a lot of mistakes, please feel free to correct me. This is the reason why I'm creating the issue. To discuss.

Overview

A wild guess is that the direction we want to take with this tool changed drastically over time. It probably started as way to manage CODEOWNERS in a "easier way" but still very manual, therefore the "interactive mode" and overtime started drifting more towards just reporting errors and automating some fixes (Like rubocop -a I'd say).

My guess comes from the fact that interactive mode is the default and should be disabled with --no-interactive.

Simplifying what we do

As all projects as they evolve, a lot of complexity arises and here I'll make some bold suggestions on removing features to make the code simpler and then being able to move forward in a smooth way.

Take these suggestions with a grain of salt as they come from my very narrow experience with this and talks with @jonatas and @dennissivia.

With that in mind, the two things I'd remove are:

Interactive mode: this was probably once the "heart" of the system, but a lot of complexity arises from that piece of code and make changes hard to be done. I'd strip that out entirely and focus first on just error reporting and later on error correction automatically. I think properly suggesting errors and fixing them on a text editor is not that big of a downgrade. In some cases, it may be even better (if you have a lot of problems currently, the interactive mode will make you follow a specific path, while if you go on your own, you can choose your path and maybe fix more important issues first)
The "group" concept on CODEOWNERS: I don't see value on extending the "AST" of a CODEOWNERS file to make it look like it is structured with "groups". We are "inventing" special comments to group things. I'd just treat the CODEOWNERS file as it is: a list of rules, comments and blank lines (and occasionally, some malformed lines that should be reported). I think this is not that complex.

Where we want to go

Again, this is based on my conversations with @jonatas and @dennissivia, so take again with a grain of salt.

We want to:

Report errors better:
- Being able to report only a subset of errors is good (like only changed as described on #82)
- Being able to maybe report only a type of error
Make it easy to fix errors:
- Automatically fix some errors (like removing a rule if no file match)
- Commands to make fixing errors easier (like adding a group to files missing rules as described on #81) or renaming a team
Not messing up (one problem we realized when implementing #80 is that sorting lines can be dangerous because of precedence)

With that in mind, I I'd like to make some suggestions on features/architectural changes we should do before:

Remove the git-related features such as --from and --to
Split every "check" into it's own class, like Rubocop's Cops. Rules would take as input: list of files in repo, the list of rules, the list of owners, whitelist. Preload or all those inputs to avoid checks doing unnecessary work and to make it easy to test things (we can pass a list of files without actually
Implement something like "Unit of Work" to allow multiple fixes to be applied consistently (e.g., deleting a line should not automatically delete from the file, but instead mark for deletion and later on removed from the file)
Add an optional "fix" method for the "Problem" class and implement them for some of the checks
Re-implement git related features on top of this simpler system by simply running checks with the inputs before and after (--from and --to), and somehow diffing the generated problems

The checks

We have currently 4 checks that we implement:

File missing rule
Rule missing file
Rule with nonexistent owner
Invalid line

As far as #80 goes, I think the best way to address some of it's points are by creating new checks:

Check to verify owner syntax (@name, @org/team, email@domain.com)
Check to verify duplicated lines (fix: remove all but one)
Check to verify duplicated owners in one rule (e.g.: file @owner @owner, fix: remove duplicate occurrences)
Check to verify valid pattern (for example, !a is a valid pathspec, but it is not valid to have a in a CODEOWNERS file, and according to the comment of Github staff on this question would make the file unparseable. (fix: just remove the line)

CLI interface

If we organize our code with the "Check" and "Problem" model, then I think a nice way of automating fixes in a predictable and discoverable CLI interface is to use the name of the check and maybe some strategy on input. Exemples:

codeowners-check \
  --on-rule-without-file remove-rule \
  --on-rule-with-nonexistent-owner set-to:@owner \
  --on-invalid-rule remove-rule \
  --on-file-without-rule set-to:@owner \
  --on-duplicate-rule keep-first \
  --on-duplicate-owner remove-duplicates

The default behavior would be just warn or nothing, which means no fix would be made and the problem would just be warned.

With that, the check could also be disabled altogether. Some suggestions on how to do it

codeowners-check --disable-rule-without-file --disable-invalid-rule
codeowners-check --rule-without-file=false --invalid-rule=false
codeowners-check --disable-checks=rule-without-file,invalid-rule
codeowners-check --checks={all but rule-without-file and invalid-rule}

jonatas commented 4 years ago

Show @bamorim ! I loved the suggestions and I agree in the direction of the tool. I think it would be great if we maintain the interactive thing or maybe just leave it on a stable version and document in case people want to use this legacy.

I'm sorry for the late reply here, I think a good part of it was already done and the interactive mode can simply be isolated into a separate CLI if we still use it.

For the AST part, I totally agree. I think it causes more mess than help us as the real intention is pretty small, only organize the sections and subsections.

poloka commented 3 years ago

@bamorim I have a suggestion for your project going forward. I would recommend breaking this project up into two projects. Focus first on pulling out the logic needed for reading the CODEOWNERS file and constructing the data objects. Have a simple entrypoint that accepts a folder location of a repository to start working from. Don't worry about figuring out how to check out the project, let your consumer determine how to check out the repository so you're not cluttering up your project. This should be a fairly simple project and it could allow for greater consumption by those who only look to write their own business logic on top of the evaluations of content from within the data objects provided. You may retain some methods like patterns_by_owner, defined_owner?, etc. to keep it clean. Another thought, allow your consumer to swap out what Logger they would like to use. This allows for consumers to use logging to STDOUT or maybe use the Rails logger if this is in constructs of a Rails application. By allowing the flexibility the consumer can use any custom logger they choose. As for the second project, this gives you the greatest flexibility to change your project how you want to go forward. I like the idea of scanning the CODEOWNERS file and providing a way for users to clean it up to meet standards; however, rather than using command line arguments I would suggest a config file. Thinking along your rubocop idea, allow consumers to create a .codeowners-checker file at the root of the project that contains the various rules you'd like to apply to the current project. That way you could have your simple codeowners-checker -a and autocorrect based on the project's configuration file.

I'm just throwing out some quick ideas but think a quick win is to first break out the data objects. Establish yourself for others to consume you to be able to do about anything they want. Then focus on how to refine and improve your checker's UI. Just my thoughts.

toptal / codeowners-checker

Discussion: where we want to go with this tool #85

Overview

Simplifying what we do

Where we want to go

The checks

CLI interface