tushartushar / DesigniteJava

Detects smells and computes metrics of Java code
https://www.designite-tools.com/products-dj
Apache License 2.0
172 stars 64 forks source link

Normalization of LCOM produces counter intuitive results and makes results less comparable #76

Open maxmeffert opened 3 years ago

maxmeffert commented 3 years ago

Hi, as far as I understand the Code and the somewhat related blog posts [1][2] and papers [3][4] the method for computing LCOM implemented in DesigniteJava is YALCOM. YALCOM seems to be an extended version of LCOM proposed by Hitz & Montazeri [3] adding support for measuring aspects of cohesion related to inheritance and applicability. I strongly agree with the idea of capturing the cohesive properties of inheritance. On a side node: I would naively like to propose to also consider calls to inherited methods since attributes of super types could also be transitively accessed by them from a sub type. Moreover, consideration of applicability seems nothing but an improvement to me. However, I have some problems comprehending the method normalization, i.e. division by number of methods.

My Understanding of LCOM

From my graph theoretic understanding of LCOM and YALCOM count the number of disconnected components of an graph representing direct and indirect usages of attributes by methods. Perfect cohesion is indicated if this graph consists of only one such component. The graph is considered less cohesive for each additional component present. Eventually the graph is considered least cohesive if it consists of the maximum number of possible disconnected components which is the number of vertices, i.e. the total number of methods.

So we have:

(perfect cohesion) 1 <= number of disconnected components <= number of methods (least cohesion)
(perfect cohesion) 0 <= normalized LCOM <= 1 (least cohesion)

Problem 1: Counter-intuitive Results

Considering a class with 3 disconnected components and 3 methods YALCOM produces a normalized result of 1 indicating least cohesion as expected. For a class with 3 disconnected components but now with 30 methods YALCOM produces a normalized result of 0.1 indicating acceptable cohesion. Obviously both cases have the same structural cohesive properties. However, normalization with division by number methods causes YALCOM to produce diametrically opposed results. Sadly, in-cohesive classes with 30 or more methods can be encountered in the wild which makes this anomaly counter intuitive to me.

Problem 2: Results are less comparable

Dividing the number of disconnected components by the number of methods of the same class tightly couples the measure to that class since different classes can have different numbers of methods. The only practical interpretation of the normalized LCOM I could come up with is "the cohesion is not as bad as it could be". This makes it impossible to use the normalized LCOM to compare different classes which seems to somewhat defeat the purpose of normalization to me.

Proposed Solution

Instead of dividing the number disconnected components by the number of methods it seems more practical to me to implement normalization with the following ratio:

Let NDC be the number of disconnected components.
LCOM := (NDC-1)/NDC

Since cases where the number of disconnected components could be zero is already handled by the applicability portion of YALCOM division by zero should not be problem. Also this ratio emphasizes the perfect cohesion with only one disconnected component and should facilitate an intuitive understanding of LCOM by encoding the Lack of cohesion in a percentage value, i.e. given the examples from Problem 1 two-thirds or 66% of a class are not cohesive with the rest.

Happy to hear your thoughts on the matter.

Regards Max

References

  1. http://www.tusharma.in/technical/revisiting-lcom/
  2. http://www.tusharma.in/research/yalcom-yet-another-lcom-metric/
  3. http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=F450176CD6FB83AA1F137BFBBEFC79C7?doi=10.1.1.409.4862&rep=rep1&type=pdf
  4. https://arxiv.org/pdf/2012.12324.pdf
tushartushar commented 3 years ago

Thanks Max for the detailed explanation and issue. It indeed needs a deeper look. I am putting it in my TODO list.