scrutinizer-ci / scrutinizer

Legacy repository - archives past feature requests/bug reports
https://scrutinizer-ci.com/docs
140 stars 36 forks source link

Weird duplication reports for php_sim #165

Open stof opened 10 years ago

stof commented 10 years ago

The new php_sim tools seems to report duplications in weird cases, wher eI fail to understand how it could be a duplicate. See https://scrutinizer-ci.com/g/phpspec/prophecy/indices/106818/duplications/18661 for instance, and explain me how this is duplicate code

stof commented 10 years ago

This one seems a bit suspicious to me as well: https://scrutinizer-ci.com/g/phpspec/prophecy/indices/106818/duplications/18646

stof commented 10 years ago

A few other ones which are not a real duplicate either, to help improve the algorithm: https://scrutinizer-ci.com/g/phpspec/prophecy/indices/106818/duplications/18658, https://scrutinizer-ci.com/g/phpspec/prophecy/indices/106818/duplications/18650 and https://scrutinizer-ci.com/g/phpspec/prophecy/indices/106818/duplications/18667 (the structure is indeed similar: throwing an exception with a sprintf-based message including the class name, but there is no real point refactoring this)

And https://scrutinizer-ci.com/g/phpspec/prophecy/indices/106818/duplications/18641 reports duplication while the classes are implementing different interfaces. It is true that the code is similar (both are callback-based implementations of their interfaces) but they cannot become a single class because of the requirement on the interface. It would be great if the similarity analyzer could take this into account.

Anahkiasen commented 10 years ago

I definitely see duplicate code in those examples but maybe not in the sense you're understanding it. To me those look like "patterns" that could be abstracted, they're not carbon copy but there's definitely similar code paths.

schmittjoh commented 10 years ago

PHP Similarity Analyzer is based on the structural similarity, it does not check for pure copy/paste - PHP Copy/Paste Detector is probably better at that. I believe most of these are actually valid though. Most of the code just has subtle differences like replacing class_exists with ìnterface_exists or switching out a class name for example.

That does not mean that you necessarily have to refactor it, but just to give a few examples, it would easily be possible to write a single method for the structure and just pass in the parts which are different in each method as a parameter. If the algorithm is more complex, it could also be moved to a dedicated class.

Also there is a lot of assertion logic spread in all the places which could be considered to be moved to a central Assertion class f.e. or maybe use beberlei's assertion library.

aik099 commented 10 years ago

I think that a lot of duplications reported here https://scrutinizer-ci.com/g/Behat/MinkZombieDriver/indices/126557/duplications/922112 can't be fixed or at least it doesn't make sense to fix them.

Let's look at getTagName and getText methods for example. They both start with pre-condition that if something bad is given as input then just return null. How you can remove that duplication at all? Moving the condition to the separate method and using that method would report same duplication.

schmittjoh commented 10 years ago

@aik099, thanks for sharing. The code looks kind of ok, just a few thoughts what you could do.

a) Use functional programming and do something like mapNativeXpath($xpath, callable $block), i.e.:

return $this->mapNativeXpath($xpath, function($ref) { return strtolower($ref...); });

This would reduce the complexity of the code and get rid of the repeated if statements.

b) Increase the minimum mass if you find the smaller snippets not useful:

tools:
    php_sim:
        min_mass: 25 # defaults to 16

We might also make this a bit stricter in the future for smaller snippets.

aik099 commented 10 years ago

@stof any ideas about https://github.com/scrutinizer-ci/scrutinizer/issues/165#issuecomment-41900593?

stof commented 10 years ago

For the case of the error handling on the native refs, it should be fixed by https://github.com/Behat/MinkZombieDriver/pull/77 as our code was not handling them properly compared to the Mink expectations.

stof commented 10 years ago

@schmittjoh the duplication errors reported now are because the similarity analyzer considers all strings as equivalent: https://scrutinizer-ci.com/g/Behat/MinkZombieDriver/indices/126591/duplications/923938 and https://scrutinizer-ci.com/g/Behat/MinkZombieDriver/indices/126591/duplications/923936 Given that the mass takes the number of lines into account, it should take the string content into account when comparing them in case most of the snippet is the string, or it should consider the number of logical lines of code instead (i.e. not counting new lines inside the strings in such case)

stof commented 10 years ago

@schmittjoh what do you think about this suggestion about counting lines ?

stof commented 10 years ago

@schmittjoh another weird duplication: https://scrutinizer-ci.com/g/Behat/MinkZombieDriver/indices/126648/duplications/926413 I don't understand how this can be similar: 1 method has 2 branches in the code and the other one has only 1