keyboardio / Kaleidoscope

Firmware for Keyboardio keyboards and other keyboards with AVR or ARM MCUs.
http://keyboard.io
GNU General Public License v3.0
756 stars 259 forks source link

Coding Style Guidelines #134

Closed algernon closed 4 years ago

algernon commented 7 years ago

@obra mentioned on IRC the other day that Kaleidoscope could really use a coding style guide. Though some of it is enforced now via make astyle, that's mostly whitespace, and formatting. Naming, code organization, and similar things are not covered, and there are a number of different styles used throughout the codebase.

It would be incredibly useful not only for future authors, but for bringing the current set of plugins to a common ground, too.

(I'm going through a number of existing guides, and will share my opinion in followup comments)

algernon commented 7 years ago

Google C++ Style Guide

One of the selling points of this guide is that we use the formatting anyway, and it is a thorough guide, covering pretty much all aspects of the code. However, it has a number of issues too, which prevents us from using it as-is.

First of all, it is written with traditional x86-64 libraries and applications in mind, where you control the build system, and so on. This can be easily seen when it talks about names and order of includes. We can't name our things like that, because of limitation of the Arduino build system: we can't use #include <Kaleidoscope/LED/Theme/Something.h>, or #include <Kaleidoscope/OneShot.h>, because Arduino won't find the libraries then. We may be able to build a tool on top of it that would, but then we'd lose the ability to use the Arduino IDE.

The guide discourages static members too (though allows them), along with globals - while we build heavily on those to conserve space, to be friendlier to the end-user, among other things.

The aim of Google's style guide is to make the code better organized, and more understandable for other developers. Our aim is to make the code easier to use for the novice user, for whom their firmware may be the first program they ever create.

Existing differences

The list below is a collection of issues where our code differs from the recommendation, and where adapting to the guide is not immediately an obvious win. There are other cases where our code differs which I don't list, when adapting to the guide is a no-brainer.

Things not covered

The guide does not cover naming, in a sense that it only controls how names should look, and does not impose a naming convention otherwise. As in, it does not tell whether to use addHook or hookAdd (or rather, AddHook or HookAdd).

Summary

There are differences between the usual Arduino way, and between Google's guide, but not too much. It feels like we could opt for following Google's guide, with a few exceptions added to cover our use-cases.

algernon commented 7 years ago

Arduino API Style Guide

The major selling point here is that this is what the rest of the Arduino world uses, and a lot of our code was written with this in mind. The downside is that it's... short, and does not talk about a lot of things that the Google guide does (namespaces, for example).

They do not cover naming either, not in detail, and are inconsistent in style too: analogRead vs readX, for example.

Summary

There goals of the style guide are great, but the details are... lacking, and there are many inconsistencies. I do not think we could use it as-is, without adding a substantial amount of our own guidelines. It seems better to chose another guide, and apply some of the good parts of the Arduino method on top of that.

algernon commented 7 years ago

ISO C++ Core Guidelines

This guide mostly covers how to write good C++, and less about naming (which would be an important aspect for us). I started looking at it because it looked fairly official from the distance. It is incomplete, a heavy work in progress, is organized terribly, and doesn't answer a lot of our questions, and goes with "follow the existing style" for the most part.

algernon commented 7 years ago

I looked at a few other guides, but a lot of them are old, or far less comprehensive than Google's one, so my suggestion would be to go with that, with the following exceptions, to be applied on top of it, overriding when in conflict:

File layout conventions

Headers

Non-standard features

Naming & organizing things

Indentation and visual style

algernon commented 7 years ago

For context, some of the reasoning behind my proposal:

Naming things

I think lower-case namespace names make sense, as a way to differentiate classes, global objects, and namespaces. At the moment, Kaleidoscope is a global object, KaleidoscopePlugins is a namespace, and this is confusing. kaledioscope::Keyboard, or kaleidoscope::Kaleidoscope as the class is clearer than Kaleidoscope_, and we can still have a global Kaleidoscope object, because the namespace would be kaleidoscope.

Thus:

namespace kaleidoscope {
  class Kaleidoscope {
  public:
    uint16_t some_variable;

    Kaleidoscope ();

    void addSomething (...);
    void removeSomething (...);

    SomeComplexType foo_bar(); // getter
    void foo_bar(SomeComplexType v); // setter

  private:
    SomeComplexType foo_bar_;
  };
};

extern kaleidoscope::Kaleidoscope Kaleidoscope;

This would allow us to get rid of the ugly KaleidoscopePlugin namespace. Using namespaces in general would make a lot of code look much nicer. They are not a big thing in the Arduino world, as far as I see, but they are great tools for clarity.

The above code demonstrates all the various ways to name things, and how they make it clearer what is what:

Due to size constraints, and with the goal of being easier for the novice user to get started, we use a lot of global objects, and prefer to avoid inheritance in user-facing APIs. The goal is that the end-user will not have to subclass anything, and that building on top of existing plugins is possible by composing them, as opposed to deriving from them.

algernon commented 7 years ago

One thing I'm not sure about is how to approach cases where we could do some advanced stuff like operator overloading, to have a nicer API, friendlier towards developers (end-users may not see anything of this), at the cost of the plugin code being more complex, and less easy to follow for newcomers.

I'm thinking of stuff like using LEDPaletteTheme.palette[index] = CRGB (r, g, b) instead of LEDPaletteTheme.palette(index, CRGB (r, g, b)). Or, in a more complex case, LEDPaletteTheme.themes[0][index] = colorIndex, or even supporting both .themes[0][index] and .themes[0][row][column]...

I have a gut feeling that while these may make the code using them look simpler, they hide way too many things under the hood, and wouldn't aid in understanding how they work.

algernon commented 7 years ago

TODO (mostly for myself):

wez commented 7 years ago

Only just got around to reviewing this. I think what you've proposed makes a lot of sense. A couple of comments for your consideration:

// This could be either a free function or a static method of the associated
// class.   It may be cleaner to make it a static method; whichever one we
// choose should also make it to the style guide.
kaleidoscope::Kaleidoscope& getKaleidoscope() {
   static kaleidoscope::Kaleidoscope kaleidoscope;
   return kaleidoscope;
}

This does turn a simple load operation into a function call (to satisfy the Construct On First Use Idiom), so it is not without cost in the arduino world. I've been using this pattern in my brief travels in arduino and have found that the global fetching is typically constrained to just a couple of locations (usually just at the top of the function that drives the main loop), so the overall code size isn't dramatically increased by adopting this pattern.

This also has the (IMO) nice benefit of not implicitly initializing classes that may have been accidentally included by an overly broad include file. This is something that has been frustrating me while trying to avoid pulling in too much USB/HID code from eg: teensy and the keyboardio HID library.

My suggestion for the style guide on this would be to avoid static class data members and instead prefer to make the class a singleton accessible via the Meyer's singleton accessor per the above. The data members could then just be made simple data members of the singleton instance, or be moved out to be their own singleton data if the code is complex enough to warrant mixing singleton and non-singleton behavior in the same class (honestly, that feels like an anti-pattern, but worse coding crimes have been committed when trying to solve real engineering problems!)

I don't have a terribly strong preference on singleton accessor. Some obvious options are:

  1. Use a static get method. eg:
class Foo {
public:
   static Foo& get() {
       Foo foo;
       return foo;
   }
};

void codeInOtherModule() {
   auto& foo = Foo::get();
}
  1. Use a free function. eg:
class Foo;
// implementation is the same as Foo::get() in the example above
extern Foo& getFoo();

void codeInOtherModule() {
   auto& foo = getFoo();
}

There are some options for naming this free function. getFoo, globalFoo, theFoo and so on.

I'm leaning towards the static method version of these things, because it just feels simpler overall. That also allows making the constructor private and preventing non-singleton use of the class:

class Foo {
   Foo();
 public:
   static Foo& get();
};
algernon commented 7 years ago

Naming

A part of me agrees that mixing camelCase and snake_case is not the smartest thing, it does have one big advantage: for someone new to programming, it makes it easier to understand that there are things that can have values, and there are others that can be run, and that there is a difference between the two. If we had the same naming scheme for both, that may be confusing for the novice user.

Singleton pattern

You make some damn solid points. I started to write a response, to explain how we achieve similar things, in a slightly user-friendlier way (constructors do nothing, in most cases, and initialization is delegated to the USE_PLUGINS() macro, where the order is explicit). The singleton pattern certainly has a lot of advantages, and the hacker in me would go for it, but it does have a few ill side-effects:

In summary, while the pattern is very tempting to use (and I learned something new about C++ while reading your comment, and writing this response, yay!), as Kaleidoscope's goals include being friendly to novice users (even if that means more work for more seasoned developers), and fitting in to the greater Arduino world, the cost/benefit ratio is, I believe, against the change, and on the side of the status quo.

On the other hand, it would make a number of things easier... and it appeals to me when having my hacker hat on. If I try to think as someone who tinkered with Arduino, but is a novice, someone who just hits the puzzle with a hammer until the pieces line up (oh how many times I did that in languages & environments I was unfamiliar with!), having libraries that look like the rest they used before is a big plus.

wez commented 7 years ago

Nod. How about this as a general rule then?

Kaleidoscope globals that have dependencies should be implemented using the plugin system, which has defined initialization order semantics. Otherwise, globals should avoid dependencies, or adopt the Meyer's singleton pattern to resolve them.

In addition, we should add a clarifying point about what goes into the constructor vs. what goes into the initialization method; there are considerations for code size here.

Another thing that bothers me about arduino conventions: the begin() method. Here's why: C++11 added range-based for loops. These allow for iterating over containers using less boiler plate. For example, you can iterate over an arbitrary array without worrying about the size or incrementing counters:

int foo_array[] = {1, 2, 3};
for (auto item : foo_array) {
   // item here has the values 1, 2, 3 on the iterations of the loop
}

C++11 allows this to work for any type that declares begin() and end() methods that return an iterator type. This doesn't require any part of the STL to use; you can have your begin() method return a simple pointer to some internal storage and things will behave correctly.

Granted that container types are rare in the arduino world, but I have used this in a couple of places where I wanted something like std::array or std::vector for stringy/streamy stuff.

I would propose that we prefer to use some other verb than begin for initializing objects (especially plugins), so that we don't block the ability to use range-based for loop. I propose using initialize or setup as alternative names for this.

algernon commented 7 years ago

The clarification sounds good to me, and so does the begin->initialize rename. May have a few issues withLEDMode there, which has an initialize method too, but we can rename that too, once we get there.

obra commented 7 years ago

Hrm.

Kaleidoscope plugins are a special case of Arduino libraries and Arduino is pretty firm about using begin() to initialize a library instance and end() garbage collect it at the end of its useful life.

I'm fine with us picking another naming convention for Kaleidoscope plugins, though we should be clear that any library that's an Arduino library which can work outside of the Kaleidoscope runloop should continue to use begin() and end() as the Arduino API style guide specifies.

For those playing the home game, the API style guide I'm referencing is this: https://www.arduino.cc/en/Reference/APIStyleGuide

It's somewhat less rigorous than Google's style guide, but does a good job of capturing the spirit of what makes Arduino so approachable.

I'd like to bring most of its content into our style guide. In cases where our style diverges from Arduino core style, we should highlight that.

algernon commented 7 years ago

Our plugins are a little different: they have no end method, and begin is only called via USE_PLUGINS (or very rarely by other plugins), never directly by the end-user.

obra commented 7 years ago

As an aside, you guys have no idea how thrilling it is that we're having a calm, reasoned discussion about coding style ;)

obra commented 7 years ago

@algernon yup.

algernon commented 7 years ago

FWIW, I have a work in progress branch here, wherein the style guide is being converted to Markdown, and augmented with TODO items.

In a second pass, I'll go over the TODOs. That includes removing parts of it which are not relevant, updating other parts that make no sense in their current form within our context, add improvements, and so on.

This will take a while. :)

gedankenexperimenter commented 6 years ago

Idea:

Plugins each get their own namespace (or, at least ones that are more than just one self-contained class). For example, Kaleidoscope-Qukeys could be in the kaleidoscope::qukeys namespace, which would have the classes kaleidoscope::qukeys::Qukey & kaleidoscope::qukeys::Plugin.

gedankenexperimenter commented 6 years ago

See https://community.keyboard.io/t/kaleidoscopes-implementation-possible-improvements/851/15

algernon commented 6 years ago

We should finish up the style guide, and convert Kaleidoscope to it. With #316, we'll break API anyway, might as well fix things up to conform to the style guide too.

Now's a good chance to do this, without having to wait for a version three of the API...

kingparra commented 6 years ago

This seems like mostly a C++ discussion, but I have a few suggestions for the style guide with respect to any bash scripts. More of a wish list, really.

obra commented 6 years ago

Chris: Is there a tool like 'astyle' or 'perltidy' for shell scripts?

On Mon, May 7, 2018 at 12:49 PM Chris King-Parra notifications@github.com wrote:

This seems like mostly a C++ discussion, but I have a few suggestions for the style guide in respect to any bash scripts. More of a wish list, really.

  • Using tabs instead of spaces for indentation would make it easier to use indented here docs, and a number of Unix utilities that dislike spaces as part of command input would be easier to work with. (Usually, I prefer a 4 space wide soft tab, but shell is finicky.)
  • Also, personally, I prefer to use the quoted and curly brace "${a_parameter_expansion}" syntax for all parameter/variable expansions. It's consider it good practice to quote everything to prevent word splitting, and using curly braces makes it easier to make future modifications or alter the behavior of parameter expansion.
  • It would be great to mention that there is a static analysis tool for common errors, ShellCheck https://www.shellcheck.net/.
  • For unit testing, there is bats https://github.com/sstephenson/bats .
  • Argument parsing is actually pretty inconvenient in bash -- there are several tools that can be used to do it, but the best of them (enhanced getopts from util-linux) isn't portable. Conventional wisdom has been to write your own argument parsing logic. Recently, someone made a tool called argbash https://argbash.io/ that will write the argument parsing logic in bash for you, given a description of the arguments as a block of comments. This seems like a cleaner solution, and something that would make it easier to modify the argument parsing in existing scripts, or make their own scripts. I think it should be the recommended method of argument parsing in the style guide.
  • It would be great to write an EditorConfig http://editorconfig.org/ preconfiguration to standardize editor settings for bash scripts on this project, such as always showing invisible characters (tabs vs spaces), using hard tabs for indentation, and doing linting with ShellCheck where the editor supports it.
  • All scripts should have a short summary of what the script does at the beginning of the file, and a list of the inputs and outputs of the script.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keyboardio/Kaleidoscope/issues/134#issuecomment-387181721, or mute the thread https://github.com/notifications/unsubscribe-auth/AACxaKo_zEVJdwc8_HeLGe5-5Phwak7Hks5twKUugaJpZM4Nt4iK .

kingparra commented 6 years ago

I've heard good things about shfmt, but I haven't yet tried it.

http://github.com/mvdan/sh

Sincerely,

Chris King-Parra

-------- Original Message -------- On May 7, 2018, 15:57, Jesse Vincent wrote:

Chris: Is there a tool like 'astyle' or 'perltidy' for shell scripts?

On Mon, May 7, 2018 at 12:49 PM Chris King-Parra notifications@github.com wrote:

This seems like mostly a C++ discussion, but I have a few suggestions for the style guide in respect to any bash scripts. More of a wish list, really.

  • Using tabs instead of spaces for indentation would make it easier to use indented here docs, and a number of Unix utilities that dislike spaces as part of command input would be easier to work with. (Usually, I prefer a 4 space wide soft tab, but shell is finicky.)
  • Also, personally, I prefer to use the quoted and curly brace "${a_parameter_expansion}" syntax for all parameter/variable expansions. It's consider it good practice to quote everything to prevent word splitting, and using curly braces makes it easier to make future modifications or alter the behavior of parameter expansion.
  • It would be great to mention that there is a static analysis tool for common errors, ShellCheck https://www.shellcheck.net/.
  • For unit testing, there is bats https://github.com/sstephenson/bats .
  • Argument parsing is actually pretty inconvenient in bash -- there are several tools that can be used to do it, but the best of them (enhanced getopts from util-linux) isn't portable. Conventional wisdom has been to write your own argument parsing logic. Recently, someone made a tool called argbash https://argbash.io/ that will write the argument parsing logic in bash for you, given a description of the arguments as a block of comments. This seems like a cleaner solution, and something that would make it easier to modify the argument parsing in existing scripts, or make their own scripts. I think it should be the recommended method of argument parsing in the style guide.
  • It would be great to write an EditorConfig http://editorconfig.org/ preconfiguration to standardize editor settings for bash scripts on this project, such as always showing invisible characters (tabs vs spaces), using hard tabs for indentation, and doing linting with ShellCheck where the editor supports it.
  • All scripts should have a short summary of what the script does at the beginning of the file, and a list of the inputs and outputs of the script.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keyboardio/Kaleidoscope/issues/134#issuecomment-387181721, or mute the thread https://github.com/notifications/unsubscribe-auth/AACxaKo_zEVJdwc8_HeLGe5-5Phwak7Hks5twKUugaJpZM4Nt4iK .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

noseglasses commented 6 years ago

Constants vs. Preprocessor Macros vs. All Uppercase Names

TL;DR

I propose to use all uppercase names only for preprocessor macros and to restrict the use of preprocessor macros in general to represent true magic only. It would be nice it this would be mentioned in the Kaleidoscope coding style guide and in other related parts of documentation that deal with Kaleidoscope coding practice.

Preprocessor Macros

In C/C++ preprocessor macros in limited occasions can be very useful to wrap up complex pieces of code. Howerver, this applies only to such code that cannot be wrapped by any other suitable C++ feature, e.g. inline functions. In most cases there are better and safer C++ constructs. Some good reasons for using preprocessor macros are

While this is not a styling issue, I'd like to propose to restrict using preprocessor macros only to cases where such good reasons apply.

Preprocessor macros are dangerous because they tend to cause side effects.

// Function call with side effects
#define aBadPseudoConstant aFunction()

// Iterator use with side effects
#define AN_EQUALLY_BAD_PSEUDO_CONSTANT *it++

Moreover, they are not type safe.

Constants

In C++ constants are by far better represented by constexpr, const, enum class and in some (again limited) cases by legacy enum, than by preprocessor macros. In contrast to preprocessor macros, all of the latter come without side effects and most of them (except for legacy enum) are even type safe.

Sidenote: Only use const if you need to take the address of a constant in memory. Use constexpr in all other cases to define compile time constants (no memory footprint).

Both non-typesafety and side-effects are very dangerous issues. Especially when debugging, macros are to be checked much more carefully than variables and constants as they tend to be much more error-prone. That's why it is very important for a programmer to be warned.

Nature tends to mark poisonous animals by color, shape, sound, etc.. This enables us to tell poisonous from non-poisonous frogs. Probably influenced by this idea, people started using all uppercase names for macros already back in the old C days, to clearly mark (poisonous) preprocessor macros. Since then to the developer - and only to the developer - all uppercase means handle with care or likely to be the culprit and in general probably not as constant as its name suggests.

Because of this it is very important to use all uppercase names only for preprocessor macros. If used for other (non-macro) constant types (see above), this means unnecessarily marking a non poisonous animal as poisonous and thus to generate confusion.

Now this is the point where the macro vs. constant thing becomes a style issue.

In Kaleidoscope's legacy code in various places all uppercase names have been used for constants. For most of those, the overhead and the risk of breaking user code is likely far too high to replace with non-uppercase names.

But the fact that there are already some all uppercase constant names in Kaleidoscope is no reason to allow or even encourage all uppercase constant names in newly written code. In my opinion this should explicitly be discouraged in Kaleidoscope's C++ style guide.

Edit 1: Fixed the snakes vs. frogs issue.

gedankenexperimenter commented 6 years ago

I really like your analogy to poisonous animals (and plants). I'd use frogs as the example creature, though — snakes are known for being venomous, which isn't the same thing as poisonous (technically). Both are toxic, but you get the toxin from a venomous creature if it bites you, whereas the poisonous one harms you if you bite it. The venomous ones are usually predators, and tend to be camouflaged; the poisonous ones tend to be BRIGHTLY COLOURED, so other animals will known not to eat them.

noseglasses commented 6 years ago

really like your analogy to poisonous animals (and plants). I'd use frogs as the example creature, though — snakes are known for being venomous, which isn't the same thing as poisonous (technically). Both are toxic, but you get the toxin from a venomous creature if it bites you, whereas the poisonous one harms you if you bite it. The venomous ones are usually predators, and tend to be camouflaged; the poisonous ones tend to be BRIGHTLY COLOURED, so other animals will known not to eat them.

I sadly displayed both, bad english and a bad knowledge of biology :-) Thanks for the clarification, you helped me to improve both. In German its more complex to have this distinction as most people use the same adjective "giftig" when they refer to both, venomous and poisonous creatures. There's probably a scientific term too but that might be Latin then.

I actually meant poisonous and you are right, frogs are definitely more adequate. I will change my above post accordingly.

gedankenexperimenter commented 6 years ago

Since I sidetracked the conversation a bit, I want to reiterate my enthusiastic support for @noseglasses's position on the use of ALL CAPS for all preprocessor macros, and only preprocessor macros.

obra commented 6 years ago

This does not seem like a bad idea to me.

Can we get our c++ linter to tell us where we're currently violating this rule? ᐧ

On Mon, Jul 2, 2018 at 1:22 PM Michael Richters notifications@github.com wrote:

Since I sidetracked the conversation a bit, I want to reiterate my enthusiastic support for @noseglasses https://github.com/noseglasses's position on the use of ALL CAPS for all preprocessor macros, and only preprocessor macros.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keyboardio/Kaleidoscope/issues/134#issuecomment-401922598, or mute the thread https://github.com/notifications/unsubscribe-auth/AACxaG-j9jVmWbhYVJw8MuevKNjBaLIsks5uCoEBgaJpZM4Nt4iK .

noseglasses commented 6 years ago

Can we get our c++ linter to tell us where we're currently violating this rule?

That would be awesome.

Unfortunately, having a glance at how things are done around

https://github.com/keyboardio/Kaleidoscope-Build-Tools/blob/86119740cf4ef2a6e399da98ac6083db2a82210a/quality/cpplint.py#L4968

it seems to me that the logic behind cpplint is quite simple. These lines try to detect errors with compile time constants at their point of use (fixed size array declaration).

There's no AST parser in cpplint, which is understandable because it's just ~6000 lines of Python code (in contrast a yacc-able parser grammar of C++ is already around 1000 lines).

To really check conformity of constant names, it would require a tool that does appropriate C++ parsing to reliably determine what is a const, constexpr, enum or enum class definition. There are some linters which would probably be suitable. But most of those depend on clang and could therefore not be bundled with Arduino-Boards due to size and portability issues.

That's one of the major downside of C++. It's such a hard language to parse. Just have a look at

A *B(C);

Is this a function definition, an instanciation or a multiplication? Impossible to tell without the context.

obra commented 6 years ago

Does clang’s linter do any better?

On Jul 3, 2018, at 5:57 AM, noseglasses notifications@github.com wrote:

Can we get our c++ linter to tell us where we're currently violating this rule?

That would be awesome.

Unfortunately, having a glance at how things are done around

https://github.com/keyboardio/Kaleidoscope-Build-Tools/blob/86119740cf4ef2a6e399da98ac6083db2a82210a/quality/cpplint.py#L4968

it seems to me that the logic behind cpplint is quite simple. These lines try to detect errors with compile time constants at their point of use (fixed size array declaration).

There's no AST parser in cpplint, which is understandable because it's just ~6000 lines of Python code (in contrast a yacc-able parser grammar of C++ is already around 1000 lines).

To really check conformity of constant names, it would require a tool that does appropriate C++ parsing to reliably determine what is a const, constexpr, enum or enum class definition. There are some linters which would probably be suitable. But most of those depend on clang and could therefore not be bundled with Arduino-Boards due to size and portability issues.

That's one of the major downside of C++. It's such a hard language to parse. Just have a look at

A *B(C); Is this a function definition, a class definition or a multiplication? Impossible to tell without the context.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

noseglasses commented 6 years ago

Does clang’s linter do any better?

TL;DR

Yes, definitely. While a clang based linter could not easily be bundled with Arduino-Boards, it could nonetheless be used for CI with Travis. However, the overhead to get it ready and for maintenance might be an issue. Messing with cpplint does not seem very attractive, though.

Which clang based linter?

clang’s linter

You probably refer to clang-tidy? I am asking, because in recent years there emerged several free and commercial linter projects that use clang as back-end (OCLint, PC-lint, ...).

If clang-tidy can do any better? Honestly, I can't tell due to a lack of personal experience with it. But due to AST parsing, clang based tools conceptually have the ability to be 100% reliable as long as their front ends do the right thing.

Why clang based linters?

Since clang came along, many tools switched to using it as their back-end instead of relying on hand written C++ parsers. Many IDEs use clang for syntax checking and highlighting. Doxygen also enabled optionally selecting clang based parsing several years ago. In the pre-clang age all those projects had severe difficulties supporting corner cases of the language and spend way too much time on maintaining parsers.

Clang based linters vs. cpplint

One problem that I see with tweaking cpplint is that it is a crazy spaghetti-code mixture of parser and rule evaluation system. There is no API to define new rules. Thus, even if we would fork the original cpplint, it would become difficult to merge future upstream changes once we messed with cpplint's internals.

Clang-tidy on the contrary supports new rules via a dedicated C++ API. By this means, it is less likely that future changes to clang or clang-tidy will break custom rule implementations.

What needs to be done to use clang-tidy?

Apart from writing the C++ rule code...

Sadly, there is no plugin system for defining rules. Rules have to be build on the testing platform and are linked into clang-tidy's binary. As clang-tidy's build process is part of clang's main build system, no extra build system needs to be maintained. While at first glance the requirement to build an entire clang seems like a show-stopper, because of to the long build times, it might be less an issue if Travis' build cache could be used to cache clang/clang-tidy's build artifacts.

My conclusion

While the overhead of generating and maintaining a clang based linter for Kaleidoscope's coding style is comparably high, it still might be better than messing with cpplint. Either way, it is not a cheap task.

algernon commented 6 years ago

While at first glance the requirement to build an entire clang seems like a show-stopper, because of to the long build times, it might be less an issue if Travis' build cache could be used to cache clang/clang-tidy's build artifacts.

We could have that in a separate repo, and publish the resulting binaries. Kaleidoscope could then pull the latest binaries, and we wouldn't need to build it every time, wouldn't need to rely on the Travis CI cache either. This way, us, developers could grab the Travis-built binaries too, if so we wished.

I'd go with a clang-based linter by the way, but as said, it's a huge task. For the time being, I'd stick to cpplint.py, and disable checks where it can't easily be configured to follow our chosen style.

noseglasses commented 6 years ago

We could have that in a separate repo, and publish the resulting binaries. Kaleidoscope could then pull the latest binaries, and we wouldn't need to build it every time, wouldn't need to rely on the Travis CI cache either. This way, us, developers could grab the Travis-built binaries too, if so we wished.

Nice idea! But we would need to be careful with compatibility of binary dependencies as Travis changes the version of its testing platform once in a while.

obra commented 6 years ago

clang-format -i --style="{BasedOnStyle: google, SpaceBeforeParens: ControlStatements, IndentWidth: 2, AllowShortFunctionsOnASingleLine: Empty, AlignConsecutiveAssignments: true, AlignEscapedNewlinesLeft: false}"

gets -close- to what we want, but isn't quite right yet. (And completely butchers macros, especially our large macro blocks)

obra commented 4 years ago

Closing this issue as we do -have- style guidelines. I'm happy to have issues for specific updates we need.