Rework the evidence algorithm to also include the number sources

xJREB / service-based-antipatterns

A structured collection of service-based antipatterns and bad smells, served by a web application for convenient browsing

https://xjreb.github.io/service-based-antipatterns

MIT License

15 stars 1 forks source link

Rework the evidence algorithm to also include the number sources #107

Closed xJREB closed 5 years ago

xJREB commented 5 years ago

I just discovered that Nanoservices (https://xjreb.github.io/service-based-antipatterns/?antipattern=Nanoservices) has an evidence score of low, even though it seems to have one of the largest number of source (8). Maybe it would be a good idea to include not only the citations of the sources, but also their absolute number.

boceckts commented 5 years ago

I proposed a similar strategy by using the number of sources as index for their popularity in #103 I added icons to the antipatterns that represent the number of sources and added a sort by number of sources to the sort field.

xJREB commented 5 years ago

OK, this is good, because it respects the number of sources as an indicator of an antipattern's importance or reliability. It is "bad", because it adds an additional concept and therefore complexity to the UI. Is your notion of popularity very different from your notion of evidence and does it make sense to keep both concepts? If there is no real value for the end-user to have both, I see no sense in NOT merging them.

So the question is: If I wanna know, if the antipattern I'm looking at is widely accepted and therefore probably reliable, do I need both of these measures separately for this? In case both have different values, which one do I trust? My suggestion would therefore be to merge this into one measure that represents the antipattern's acceptance and reliability. Maybe you could even call it popularity instead of evidence, even though that sounds somewhat odd for an antipattern... 🤔 Not sure... What do you guys think? Does that make sense? Am I missing something?

boceckts commented 5 years ago

Will be added to PR #101. How should the evidence score be calculated? Shall we just use the sum of all "cited by" of each source for an antipattern? Currently evidence ist the median.

xJREB commented 5 years ago

Hmm, my intuitive feeling would be that several sources should count nearly as much as one source with a lot of citations. Maybe something like this:

So, each source counts a least 1 plus the log of the number of citations, e.g an antipattern with one source with 1,000 citations has an evidence of 4 while an antipattern with 4 sources with 1 citation each has also an evidence of 4. How does this sound?

Also, what is the current mapping algorithm of the evidence rank to the labels low, medium, and high?

boceckts commented 5 years ago

Ok, sounds like a good algorithm. We checked the range in which most of the antipatterns' evidence score appeared and used this as medium. With your formula the evidence ranges from 0.x to 25.x with most evidence scores between 2 and 6

boceckts commented 5 years ago

With your formula and the categories

low: 1
medium: 3,
high: 10,
very high: 20

3 antipatterns fall in the low category, 28 in the medium category, 4 in the high category and only 2 antipatterns in the very high category