graydavid / graydavid-parent

Apache License 2.0
0 stars 0 forks source link

Come up with source code guidelines #2

Open graydavid opened 1 year ago

graydavid commented 1 year ago

The following guidelines target best/agreed practices for working with source code.

[[TOC]]

General

The following guidelines apply to all code.

These Guildelines Take Precedence over Counterexamples in the Code

Counterexamples to these guidelines that exist in the codebase are not justification for ignoring the guidelines. The guidelines were created after most of the code in the application was written. In addition, it's also not practical to expect every code review to catch every violation.

Follow Effective Java (Unless Otherwise Indicated)

Follow the tenets of the 3rd edition of the book Effective Java... except in the following cases:

Avoid Double Negatives in Conditional Expressions

Example

// DON'T DO THIS. THIS IS AN ANTI-PATTERN.
if(object != null) { // This is a (single) negative
  ...
} else { // The else makes it a double-negative
  ...
}

// INSTEAD, DO THIS
if(object == null) { // This is a positive
  ...
} else { // The else makes it a (single) negative
  ...
}

Explanation

The key in the above example is the presence of the else branch, which automatically negates the if condition. If the if condition is stated in a negative way, then the else branch results in a double negative.

Double negatives are confusing to understand. E.g. "I don't not live in the United States" or "I haven't not gone to work early", both of which can be made much simpler by expressing them with the equal positive statements "I live in the United States" and "I have gone to work early (or on time)".

Use CamelCase for Acronyms and Initialisms

Example

// DON'T DO THIS. THIS IS AN ANTI-PATTERN.
HTTPRequest hTTPRequest = ...
Map<Id, HTTPRequest> idToHTTPRequest = ...
HTTPXMLEntity hTTPXMLEntity = ...
HTTPXMLEntity httpXMLEntity = ...

// INSTEAD, DO THIS
HttpRequest httpRequest = ...
Map<Id, HttpRequest> idToHttpRequest = ...
HttpXmlEntity httpXmlEntity = ...

Explanation

Using camel case consistently helps readers figure out where one concept ends and another begins. Using all capital letter may feel okay for single and well-known acronyms/initialisms, but problems arise with multiple, less-well-known strung back-to-back.

Notes:

Use the xToYs Style When Naming Map Variables

Given a Map of type X to Y, name any such Map variable as xToYs.

Example

// DON'T DO THIS. THIS IS AN ANTI-PATTERN.
Map<Bicycle, Part> parts = ...
Map<Bicycle, Part> bicycles = ...
Map<Bicycle, Part> bicycleParts = ...

// INSTEAD, DO THIS
Map<Bicycle, Part> bicycleToParts = ...

Explanation

The naming pattern for other collection class is well-understood: pluralize the item name (e.g. List<Bicycle> bicycles). Different people tend to use different strategies for Maps, though, which causes unnecessary confusion. We believe the xToYs pattern has the most advantages, which is why we want to standardize on it.

Notes:

Don't Write Long Methods

Longer methods are more difficult for readers to reason about. Readers have to maintain the context of the entire function in their head to understand it fully. The longer the method, the more difficult this is to do, and as the user modifies one section of the code, another section may be affected in an unknown way.

To avoid this issue, break larger methods down into smaller ones. Shoot for a maximum length of 20-30 lines but never more than 50. Also, don't be afraid to create a function of even only a few lines, or sometimes even one line, if you think it makes the method more readable.

Here are some good heuristics for when to move code to a new method:

Minimize Indentation in Methods (to a Max of 2 Levels)

Example

  try{               // Level 0
    if(...) {        // Level 1
      for(...) {     // Level 2
        while(...) { // Level 3 -- DON'T DO THIS. THIS IS AN ANTI-PATTERN.
        }
      }
    }
  }

Explanation

Each level of indentation adds complexity. All of the details of every level are exposed to the reader. The reader mentally has to maintain the context of every previous level while creating a new context when encountering the next level. Do the contexts for the outer levels mix with the current one? If so, in what way? Etc.

To reduce complexity, move code blocks to new helper functions. These helper functions both create useful abstractions, allowing the reader to neglect details, and help the reader understand how different levels interact (limited through the function signature). As an example, one thing I almost always do is implement the body of any sufficiently-complex try block in a separate method.

Notes:

Try to Order Method Definitions Like a Newspaper

(From the book Clean Code) order method definitions by level of abstraction, with the caller preceding the callee.

Example

public void method1() {
  method2();
  method4();
}

private void method2() {
  method3();
}

public void method3() {
}

private void method4(){
}

In this example, methods are not organized by access level (i.e. public vs. private vs. ...). Instead, they're organized by abstraction level... almost like a depth-first search. method1 first calls method2 and then method4, so method2's definition comes next in the file. Method2 calls method3, so method 3's definition comes next. Finally, we go back to the second method that method1 called and define method4.

Explanation

The idea is to minimize the vertical separation between dependent methods. E.g. while reading method1 in the example above, if I want to know the details of method2, then I don't have very far to go.

This should be the general rule, anyway. There may be times where there are so many methods, that a strict following of the rule makes the code less readable. There also may be times where the cohesiveness of several functions is more important. These situations should be the outliers, though.

Use "Final" When Necessary (and Only When Necessary)

The guidance aligns with this stackoverflow entry but is more extreme:

TODO Comments Must be Linked Somewhere Trackable

Example

// DON'T DO THIS. THIS IS AN ANTI-PATTERN.
// TODO: initialize this in a constructor
int numWheels = 0;

// INSTEAD, DO THIS
// TODO (http://link.to/somewhere/trackable): initialize this in a constructor
int numWheels = 0;

Explanation

TODO comments that have no tracking link are often useless. The author may have had good intentions (e.g. wanting to improve the code). Unless someone is accountable, though, the comment often just sits in the code. Other developers often have no context and are left wondering why it exists, who plans on making this change, when it can be made, etc.

If you want to add a TODO comment, it must be accompanied by a trackable link. E.g. use a GitLab issue. The link must contain more backstory on why the change was not made in the first place and what needs to be done before it can be made. Management then needs to be made aware of this link if not already, so that they can track it and make sure it gets done eventually.

A good example of using a TODO comment would be trying to implement a project in phases. Assume you're assigned a GitLab issue for the project. Say you make changes for the first phase. As a part of this change, you add a TODO with a link to the GitLab issue for changes you want to make in the second phase. Once you get to the second phase, you make the changes and remove the TODO comment. In this case the TODO comment serves as a sort of "I know this code isn't complete, but I promise to come back and change it... and if I don't, here's how to hold me accountable."

Write Useful Comments

Add comments to code when what a piece of code does is not obvious or you want to provide more context on why it exists.

Example

class Domain {
  // DON'T DO THIS. THIS IS AN ANTI-PATTERN.
  /** Determines if this domain is OPS. */
  public boolean isOps(){
  }

  // INSTEAD, DO THIS
  public boolean isOps(){
  }

  // DON'T DO THIS. THIS IS AN ANTI-PATTERN.
  /** Determines whether this domain is internal. */
  public boolean isInternal() {
  }

  // INSTEAD, DO THIS
  /** Answers whether this domain is visible only to internal clients, not to external customers. */
  public boolean isInternal() {
  }

  // DON'T DO THIS. THIS IS AN ANTI-PATTERN.  
  /** 
   * Answers whether this domain is only visible to internal clients, not to external customers.
   *
   * @return whether this domain is only visible to internal clients, not to external customers.
   */
  public boolean isInternal() {
  }
}

Going through the examples above:

Explanation

Every comment should add value. Call-outs:

Add a Justifying Comment Whenever You Suppress a Warning

Example

// DON'T DO THIS. THIS IS AN ANTI-PATTERN.
@SafeVarargs
public static <T> List<T> doSomething(T... items) {
}

// INSTEAD, DO THIS.
// Suppress justification: items is not being manipulated or returned from the method.
@SafeVarargs
public static <T> List<T> doSomething(T... items) {
}

Explanation

Suppressing a warning is a big step. You're saying that you know better than the compiler. You could be wrong. By explaining why you think you know better, this will both help you get your thoughts straight and justify the decision to readers. If, in the future, a reader comes along and doesn't think the explanation is sound, they can feel safer modifying the code accordingly.

(Almost) Never use Parallel Streams

Don't use Java's in-built parallel stream. If you want to divide-up work, find another way to do it.

Example

// DON'T DO THIS. THIS IS AN ANTI-PATTERN.
bicycles.parallelStream()...

// INSTEAD, DO THIS
bicycles.stream()...

Explanation

parallelStream uses the application's single, common fork join pool for associating a thread with each task to perform. Blocking tasks make threads in the pool useless. With enough blocking tasks, overall application performance suffers, to the point of being unable to progress.

Notes:

Organize Classes in Java Packages Logically

Organizing code into classes can be difficult enough. Organizing those classes into Java packages can seem more difficult, but with similar principles, it doesn't have to be. Here are some heuristics you can use:

Read this Article

The article describes four different ways to think about organizing classes into Java packages: by component (aka feature), layer, toolkit, and kind. You can even use those ways in combinations to get the best of both worlds. For example, some web apps divide code first by feature and then by layer. (Note: as stated in the article, be extremely wary about organizing code by kind.)

Strive for Logical Cohesion

The classes inside of a Java packages should be logically cohesive.

What is the purpose of the Java package? Try to describe it in a single sentence. Does a class contribute to that purpose? If so, consider including it.

Would the majority of Java package consumers want to use that class? If so, consider including it.

Don't make Java Package Dependencies Circular

Classes in one Java package (call it Consumer) will often depend on those in another Java package (call it Dependency). All of those dependencies should point one direction: don't have the Dependency package also depend on the Consumer package.

Why? Circular dependencies blur the line between the Java packages: Consumer can't do what it does without Dependency... but if the opposite were also true, there would be no distinct identity between the two. The single Java package would really just be masquerading as two. (As a thought exercise imagine that each Java package went in its own Maven package, which forbids circular dependencies.)

Strive for reuse

Imagine the Java package as providing additional functionality that any other package can utilize simply by depending on it.

Isolate Transitive Dependencies

The easiest way to think about this is to consider Maven package dependencies. The more transitive Maven packages that a Java package includes:

Use Maven dependencies as a heuristic for deciding when to create another package. E.g. imagine that you're developing a Java package without any Maven dependencies so far. Suddenly, you have to add a dependency on an XML parsing library. Not all clients may want this XML functionality. So, ideally, you don't want to force this dependency on them: give consumers a choice. Create another Java package and consider that it's "adding an XML feature to the core Java package you've been creating".

Note: sometimes/usually this is just a thought exercise. To work in reality, this scheme would require a separate Maven package for the separate Java package. That's the only way clients could decide to exclude the transitive Maven dependency. For simplicity, our code doesn't actually do this.

Avoid Generic Constants Classes

Sometimes, it can feel easy to create a single Constants class for the entire application and put all constants inside of it. There are several problems with this organization:

To avoid this, think about the following heuristics for deciding where a Constant should go.

Identify a Logical Owner

Is there a single, logical class you can identify for containing this Constant, such that all consumers can depend on it?

For example, consider the maximum value that an int can take on in Java. This value is defined inside of the Integer.MAX_VALUE class itself. Integer makes sense as the logical owner of the class. The JDK, itself, doesn't contains a generic Constants class that defines this value.

Now, as a counter example, you may point to the Collections class, which seems to contain a lot of different types of Constants (e.g. empty lists, sets, maps, etc). You could make the case that those constants rightfully belong elsewhere (e.g. you could argue that the empty list belongs on the List class). You wouldn't be wrong there. However, that doesn't mean that Collections is wrong either. It's serving as a facade, exposing a group of constants that have some logical cohesion: they're all collections of the types defined by the JDK itself. There is some meaning to them, and users would know to go there to find them.

Look to Class Organization

In large methods, people will often group code into blocks and add a comment to describe what the block does. This (as mentioned elsewhere in this document) is a heuristic for deciding when to break-up a large method into smaller ones. The same is true of Constants classes.

If you find yourself vertically separating a group of constants and adding comments to explain what that group is, that's a good sign that this group belong in another class. Each Constant may have its own owner, or all of the Constants may actually belong together... but just in their own class.

Use enums

Using enums over primitive or library classes has the following benefits:

Logging

The following guidelines apply to logging.

Don't Log Sensitive Information

When you log, make sure that you're not logging sensitive data, even if it would be helpful to debug. Examples include Personally Identifiable Information (PII), credit card information, etc. Besides obfuscating sensitive information directly in the log statement, for complex objects, make sure all toString implementations obfuscate as well.

Log When Useful, no Less and no More

If you log too little, you won't have enough information to debug issues in production. If you log too much, you'll add noise to the logs, making debugging more difficult. Hitting the sweet spot can be difficult, so here are some rules that can help:

Who, What, When, Where, Why, and How

(From this article) use these six questions as guidance. Can you answer them all with the current logging in place?

Run Thought Experiments

Imagine potential failure scenarios in production. What questions would you ask yourself (e.g. what was the value of variables X, Y, etc.)? Are there enough logs, and do the logs contain enough data to help you answer those questions?

Remember and Capture Context for Exceptions

Exceptions provide useful debugging information, but as you let them propagate, they can lose context. E.g. a NullPointerException from a constructor is useful, it only captures the variable that was null. What about the method that called the constructor? What were its arguments? As necessary, you can supplement exception information by catching it and wrapping it in a new exception, using the new message to capture that context. That way, when the exception is logged, it will provide what you need. (Note: Spring does a great job of this.)

Exception Handling

The following guidelines apply to exception handling.

Don't Catch the Exception Class Without Either Propagating It or Accounting for InterruptedException

Whenever you catch Exception (or InterruptedException directly), you have extra work to do.

Example

// DON'T DO THIS. THIS IS AN ANTI-PATTERN.
try {
  ...
} catch(Exception e) {
  // code that logs xor rethrows e as an unchecked exception
}

// INSTEAD, DO THIS
try {
  ...
} catch(IOException | RuntimeException e) {
  // code that logs xor rethrows e as an unchecked exception
}

// OR (RELUCTANTLY) DO THIS
try {
  ...
} catch(Exception e) {
  Thread.currentThread().interrupt();
  // code that logs xor rethrows e as an unchecked exception
}

// OR (EVEN MORE RELUCTANTLY) DO THIS
try {
  ...
} catch(Exception e) {
  // code that does something with e
  throw e;
}

Explanation

This guideline is from this stackoverflow answer. InterruptedException is a signal that someone wants the current code execution to stop. If you suppress that information, either by swallowing the exception or even re-wrapping the exception in another exception and throwing that, then you have the responsibility to call Thread.currentThread().interrupt(). This call communicates the same information as the InterruptedException: someone wants the current code execution should stop.

The above examples cover the options that you have in this situation. Here they are, listed in order of simplicity/preference:

  1. TODO: We should create library code that handles this for us and have users call that.
  2. Only catch those checked exceptions and RuntimeExceptions that are possible.
  3. Call Thread.currentThread().interrupt() after suppressing the Exception.
  4. Propagate the checked exception (which according to other guidelines in this document, you should avoid doing)

Only Swallow Exceptions with Justifiable Reasons

Catching an exception and ignoring it (note: logging an exception is still ignoring it) is only permitted in the following cases:

  1. The exception is expected or otherwise does not affect the application flow.
  2. Given legitimate errors, there is a reasonable fallback and a mechanism for alarming based on number of occurrences.

Example

The following examples are adapted from actual code in the codebase.

// DON'T DO THIS. THIS IS AN ANTI-PATTERN.
public void resetPassword() {
  int numResetAttempts;
  try {
    numResetAttempts = session.getAttribute(ATTEMPT_KEY);
  catch (Exception e) {
    numResetAttempts = 0;
  }

  if(numResetAttempts > MAX_ATTEMPTS_ALLOWED) {
    // return error view to user that they've tried to reset their password
    // too many times and need to contact support for further assistance.
  }
}

In this example, the code tries to read, from the session object, the number of times the user has tried to reset their password. If it's greater than the maximum number of times allowed, then the user is shown an error page.

As written, though, if an exception is thrown, this check will always be true. This scenario allows essentially an infinite number of password reset attempts, which is insecure.

// DON'T DO THIS. THIS IS AN ANTI-PATTERN.
private static Policy policy = null;
static {
  try {
    policy = ...;
  catch(PolicyException e) {
    logger.warn(e.getMessage());
  }
}

public static boolean isDangerous(String htmlIn) {
  try {
    return policy.scan(...);
  catch(Exception e) {
    logger.error(e.getMessage());
    return true;
  }
}

In this example, if the static policty-initializer code throws a PolicyException, policy will be set to null. Then, whenever isDangerous is called, the internals will throw a NullPointerException, log it, and then return true. As a result, all html will be considered dangerous.

This code is used for user registration and password resets. Both of those usecases will be broken. (Assuming no log scans are in place) no one will know why until a user reports the issue, potentially days after the application started. Admins will go to the logs, see potentially hundreds of NullPointerExceptions, and have to search backwards for the original cause in the static initializer.

An alternate strategy would have been to let the static intializer exception propagate exception, resulting in failures to initialize the class, which is a hard failure that people would have noticed and could resolve right away.

Explanation

As we saw in the above examples, swallowing an exception may seem like the right thing to do. After all, you're keeping the application running and not showing the user an error view. However, it can also mean that you're silently ignoring a legitimate error and, in the worst case, putting the application into an invalid state such that it still gives users a poor experience.

Your default approach should be to propagate exceptions: hard failure is a faster way to detect and deal with errors. (Ignoring situations where the exception is expected or otherwise does not affect the application, in which case swallowing is okay to do), for situations indicative of real errors, you may only swallow the exception if both of the following are true: there is a legitimate fallback behavior and there is a mechanism in place to detect the error that just happened. For the latter, this may involve logging with a specific pattern of text so that log scanners can pick it up and alarm.

As a code reader, when you see someone swallow an exception, your immediate reaction should be one of suspicion... until you prove to yourself otherwise.

Add a Justifying Comment Whenever you Swallow an Exception

Inline with the previous guideline, whenever you swallow an exception, even if you log a message, you should always add a comment explaining why swallowing is the right thing to do.

Example

// DON'T DO THIS. THIS IS AN ANTI-PATTERN.
try {
  ...
} catch(SomeException e) {
}

// INSTEAD, DO THIS
try {
  ...
} catch(SomeException e) {
  log.error("Message", e);
  // Swallow justification: we don't want the entire user experience to fail and have reasonable fallback behavior X for this error
}

Explanation

Swallowing exceptions is a code smell. Readers will naturally wonder whether swallowing indicates a real error that's being unjustifiably ignored and whether they should change the code to throw it.

Log with the Exception as the Second Argument

When you do log an exception, always pass the exception as the second argument.

Example

// DON'T DO THIS. THIS IS AN ANTI-PATTERN.
logger.log(exception);
logger.log(exception.getMessage());
logger.log("This is it: " + exception.getMessage());

// INSTEAD, DO THIS
logger.log("Some message, even if it's just a placeholder", exception);

Explanation

Only by passing the logger as the second argument will you get the (very useful) stacktrace. With the other two forms above, you only get the exception message itself, which is less helpful.

Log xor Rethrow an Exception but (Almost) Never Both

When you catch an exception you should (almost always) do exactly one of the following and not both: log it or rethrow.

Example

// DON'T DO THIS. THIS IS AN ANTI-PATTERN.
try {
  ...
} catch(SomeException e) {
  log.error("Message", e);
  throw e;
}

Explanation

This guideline assumes that there is some generic exception handling code at the entrance point to the application (e.g. in a Controller, in a Filter, or in the Web Server itself). This code both logs and rethrows the exception. It's the only piece of code allowed to do both.

Meanwhile, no other piece of code should do both, because if you do, you'll end up with duplicate log statements for the same exception, adding noise.

(Almost) Never Throw or Propagate Checked Exceptions

Java has it's own official policy on when to use checked exceptions, but ours is more extreme:

Example

// DON'T DO THIS. THIS IS AN ANTI-PATTERN.
public void myMethod() throws IOException {
  throw new IOException("");
}

// DON'T DO THIS. THIS IS AN ANTI-PATTERN.
public void myMethod() throws IOException {
  Files.newInputStream(...) // this method throws an IOException
}

Explanation

Checked exceptions are annoying:

As a result, we shouldn't throw checked exceptions ourselves, and we should limit their blast radius by dealing with them immediately from code that we call, rather than allowing them to propagate.

TODO: we should write a utility class that helps transform a checked exception-throwing function into an unchecked version. Add new guidance here once that's done.

Don't Create and Throw RuntimeExceptions

Example

// DON'T DO THIS. THIS IS AN ANTI-PATTERN.
throw new RuntimeException("A problem happened here");

// DON'T DO THIS. THIS IS AN ANTI-PATTERN.
try {
  ...
} catch(IOException e) {
  throw new RuntimeException(e);
}

Explanation

RuntimeExceptions are generic. They provide no useful extra information about the root cause of what went wrong. Instead, you should create new exception classes or reuse existing ones as appropriate.

TODO: we should write a utility class that handles transforming a checked exception into an unchecked exception. This includes an UncheckedException class, which can be used for wrapping.

Don't use Exceptions for Flow Control

Example

In this example, processor is allowed to be null: that's an acceptable state for the program to be in.

// DON'T DO THIS. THIS IS AN ANTI-PATTERN.
int number;
try {
  number = processor.getNumber();
} catch(NullPointeException e){
  number = 0;
}

// INSTEAD, DO THIS
int number = processor == null ? 0 : processor.getNumber();

Explanation

Exception-handling code is ugly and unnecessarily expensive compared to alternatives. If you expect the exception and have the chance to avoid it instead, you should do that.

Lombok

The following guidelines are specific to usage of the Lombok library.

Use only Accepted Lombok Features

All of the following Lombok features fall under restricted use --

For New Packages, Exclude Lombok-generated Code from Checks

Follow the advice here and configuration here to exclude Lombok-generated code from test coverage and findbugs/spotbugs checks. This generated code is trusted, so it's okay that we don't write tests against it and that spotbugs thinks it has issues. In short, create a file named lombok.config in the project's root directory, and include the following contents:

lombok.addLombokGeneratedAnnotation = true
lombok.extern.findbugs.addSuppressFBWarnings = true

Code Submission/Review

This following guidelines apply to code submission/review.

As a Code Submitter, use a Standard Template for the Commit Message

Use the following template for your commit message:

<title>

* Motivation: <motivation>
* Design: <design>

----

Testing

<testing>

Here is what each section means:

As a Code Submitter, Review the Code Diff Before Submitting for Review

Before you submit your change for review, look at the code diff yourself, either in git or through GitLab. Yes, you've already checked the code functionally, but you'll be surprised at how often you find something that could be improved. E.g. it could be a small change like removing excess whitespace. It could also be a larger change like forgetting to remove test values, accidentally reformatting a file, etc. Think of this step as a final quality check.

As a Code Reviewer, Feel Free to Push Back on Large Formatting Changes

If a code review has large formatting changes, unrelated to the main change, that interfere with your ability to review it, feel free to ask the submitter to undo the formatting. This is true even if the file was originally in a bizarre format and the submitter merely applied a standard format to it.

Formatting the file in a standard way is definitely a good thing, but lots of formatting changes increases the chance that some unintentional change will be introduced and decreases the likelihood that it'll be caught before being deployed. The right sequence to perform this change is to format the file in one commit and then make the desired changes in another commit (or vice-versa). The reviewer should be able to review both commits.

In the end, this is a judgement call. If the formatting applies to just a few lines, maybe you, as the reviewer, can tell the difference and don't mind. That's up to you, though. We're all working together in good faith to try and release the best code to customers as possible with the minimal effort.