Closed oprogramador closed 3 years ago
or another example - according to Shannon, the entropy of a
is 0 and the entropy of aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
is 0 as well
Or:
abcdefabcdef
-> 2.58496
abcdefcbafed
-> 2.58496
Or:
01
-> 1
0100001011110101000000010000100100001100110101001100011101110100110101011110110111110001110111110100
-> 1
01
-> 1
00001
-> 0.72193
Hey @oprogramador, Thank you for the compelling issue. I'm currently researching into this. I have added this plugin to a few of the larger projects I work on. I think the current problem is that the false positives tend to be actual words. This isn't an issue until you have large inline strings with things like paragraphs (like auto-gen) docs. I'm currently trying to think of a good solution to this. Let me know what your thoughts are. I'm going to keep brainstorming. Maybe some NLP? Cheers, Nick
@nickdeis
that's my solution https://github.com/oprogramador/eslint-plugin-no-credentials/blob/master/src/calculateStrongEntropy.js
multiplying the Shannon entropy plus 1 and zipped data length minus 20 (because it's always at least 20)
you can see the results here https://github.com/oprogramador/eslint-plugin-no-credentials/blob/master/src/tests-mocha/calculateStrongEntropy.js
Super interesting. Wouldn't entropy and compression rates be colinear? I suppose this ends up being a weighted measure of entropy and string length. Any reference material used to come up with this?
Closing as over a year old
@nickdeis
I invented my own approach in my library to have a relatively good measurement of information quantity.
IMO Shannon entropy isn't a good measurement because a given string repeated 100 times has the same entropy as repeated only once. Of course, repeating the same sequence doesn't increase much the amount of information but in some level increases.
IMO:
abcd
-> log_2 (4) which gives 2abcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcdabcd
(abcd
repeated 100 times) -> log_2 (4 + log_2 (100)) = 3.41https://www.shannonentropy.netmark.pl/calculate