pcdshub / lcls-twincat-general

A PLC code toolkit for LCLS TwinCAT PLC projects
https://pcdshub.github.io/lcls-twincat-general/
Other
16 stars 20 forks source link

Static code analysis - naming conventions #86

Open sagatowski opened 10 months ago

sagatowski commented 10 months ago

This pull request serves dual purpose:

  1. Implement SLAC naming conventions into the lcls-twincat-general library
  2. Initiate a discussion of static code analysis for TwinCAT software at SLAC

The TL;DR of this PR is that static code analysis is easy to use once it’s in place, but hard to implement if it’s on an existing project.

The (slightly longer version: What follows are some findings I did while trying to implement the static code analysis into this library. First of all, when I’m talking about static code analysis in this context I’m referring to the static code analysis tool TE1200 that Beckhoff provides. If you don’t have a license for TE1200, it’s still possible to use the static code analysis light, which provides a small subset of the static code analysis rules. The light version does not provide any help with naming conventions.

In the current state of the library, static code analyser rules + naming conventions have been enabled. These are however not consistent with the naming conventions that are documented. For example the library uses: x as naming convention for starting Booleans, while the documentation says that b should be used P_ as naming convention for start declaration of programs, while the documentation says nothing should be used

I’m assuming that whatever is in the library is incorrect and was simply a test to try out the TE1200 tool.

The TE1200 can be executed in two different ways. One is simply execute the static code analysis, which will work the very same way as building a project. This means that the tasks are the entry point, and then the static code analysis will execute on all objects referenced by tasks/programs. The other alternative is to execute the TE1200 on all objects, which works the very same way as to try to build all objects, independently of whether they are referenced by a task or not. I used the second option for everything, which I assume will always be the case for libraries.

First thing I did was to update the naming convention rules in the project so it’s in accordance with the documentation. Next, I simply tried to execute the static code analyser. I immediately reached the limit of the tests (which is set to ~500). After this number, the TE1200 simply doesn’t continue doing checks. Some of the errors were legitimate, while others were simply wrong and what I would simply call bugs with TE1200. I wrote about these type of bugs over 5 years ago on my blog. As always, Beckhoff tries to fix the bugs, but as there are no release notes (ugh), only the deep dungeons of Beckhoff-Verl know if/when things are fixed. As this library is developed using an slightly older version of the XAE (3.1.4024.35), this means that not all bug-fixes are there either. The TE1200 is integrated into the XAE, and thus it can’t be updated separately. If I went through all these errors, I would never be able to finish with something in a sensible amount of time. Given the above, I’m not entirely sure it’s even worth it. In either way, in the current state the static code analyser is too buggy. At least to use it for all checks. Maybe it will be better in 4026, maybe not.

So I thought that the reasonable thing to do first, is to use the naming conventions part of the static code analyser. What I did next is to export all the existing static code analyser rules + naming conventions into the static-analysis-rules.csa-file (part of this PR). Next, I disabled ALL static code analyser checks, and only left the naming convention checks. This information is stored in the project. Now, we only do the naming convention checks of the project. If we ever want to do the static code analyser checks, we simply import the exported file into the project again. From that point, I ran the static code analyser and got the following result:

image

165 errors. That’s something we can work with. Now I went on a journey of exploration and discovery trying to understand what you are doing. As I have limited knowledge about the context of where this library is used, I can’t foresee all the implications on all these changes. The only thing I can see is that there will be implications as there are multiple variables that are exposed through API’s of the function blocks (as var_input, var_output or through method-calls) that have been changed. As everything in TwinCAT is exposed through ADS (TcCOM), the implications can obviously be bigger and that’s one of the topics I think a discussion needs to be made.

Generally speaking there were a lot of places that were not consistent with the naming conventions. I updated those to the naming conventions defined.

Also, many naming conventions can’t automatically be detected. See this example:

image

The naming convention says that a method doesn’t need a M_ or anything like that. It should just for example be DoAThing(). But the above examples would be OK, as the naming convention in the TE1200 tool would simply have an empty string, and that means that the methods can start with anything you want.

In the documentation, you state the following about references: image

There is no way in the tool to separate between these two. There are just references. So I simply configured the tool that there is nothing special with references, and that they should be treated like their respective primitives they are referring to (in other words, case 1 above). This means this would be valid syntax: image no matter whether it’s used as a method input or in other cases (1 & 2 above).

I enabled this setting (enabled by default): image

What this means is that naming conventions will be “stacked” on top of each other. If we look at this example in the PR:

image

It’s ct because it’s a constant and it’s a TIME-type. If you don’t use the c before you will get an error: image

This is a more extreme case: image

ast as it’s an array (a) of structures (st).

Another example: image

As it’s a pointer (p) to an integer (n).

And the most extreme case: image

Array (a) of pointers (p) to function blocks (fb).

Next, I didn’t only have to change VAR_INPUT and VAR_OUTPUT (which will definitely brake functionality in dependencies), but also %I* and %Q*-variables, like this one:

image This will obviously break all instances of this variable that is pointing to EtherCAT in any place that is using an instance of this variable.

I also noticed you had a few of these: image I assume that you are using this to further process this variable through the TMC for some purpose? These (like all other variables) also had to be changed.

I noticed that you had a few pragmas {attribute ‘naming’ := ‘omit’} in your code. I didn’t change those as the static code analyser actually ignored the naming roles on those places. Like here for example: image

Running the static code analyser now with only the naming conventions activated will make the static code analyser pass with 0 errors.

I want to finish by asking a question, and that is if we even need to use Hungarian notation at all, and whether it could make sense to reconsider them? I wrote a little about it on my blog a few years ago, so I felt I need to raise the question.

ZLLentz commented 10 months ago

I want to finish by asking a question, and that is if we even need to use Hungarian notation at all, and whether it could make sense to reconsider them?

I thought about this on and off since you made this PR and also asked for some opinions from the rest of the team (not a lot of responses, admittedly). I think the truth is that I (and the team) largely use Hungarian notation out of habit. It can be useful to contextualize typing information when reviewing a git diff, but otherwise most of the specifications are redundant. Googling around, it seems like this is only truly advocated for in the PLC sphere, which tends to lag behind standard practices, which implies that it has largely fallen out of favor.

The wikipedia page has some fun opinions on it: https://en.wikipedia.org/wiki/Hungarian_notation#Notable_opinions

I'll think about this some more, but my impression after a couple days of consideration is that most of the prefixes here shouldn't be rigerously checked. I think some of them have more value than others, for example it can be nice to sort and differentiate between enums, structs, etc. in the solution explorer, which otherwise have the same symbol and only carry this information in parentheses at the end of the line that tends to blend together.

This opinion is made prior to reviewing the specifics of this PR, I'll look at the diff now.