platisd / duplicate-code-detection-tool

A simple Python3 tool to detect similarities between files within a repository
MIT License
162 stars 30 forks source link

be suitable for code similarity check? #31

Closed schunlee closed 4 months ago

schunlee commented 5 months ago

from the source code, I see gensim be used here. gensim be used as text similarity check is cool, but whether be suitable to code similarity check? I mean, for example I want to compare two Unity projects, all scripts must follow C# syntax, many C# fixed words, and to use Unity Engine, all same framework sentences be used such as 'using UnityEngine'. Can gensim ignore these? no mis-check?

platisd commented 5 months ago

all same framework sentences be used such as 'using UnityEngine'. Can gensim ignore these?

As you mention, gensim isn't specialized to code and therefore won't be able to recognize common/utility syntax, e.g. using UnityEngine. That being said, if your files are large enough such common parts should not make a big difference.

schunlee commented 5 months ago

all same framework sentences be used such as 'using UnityEngine'. Can gensim ignore these?

As you mention, gensim isn't specialized to code and therefore won't be able to recognize common/utility syntax, e.g. using UnityEngine. That being said, if your files are large enough such common parts should not make a big difference.

and sorry, beyond my topic here, I want to know your Github Action workflow no need to clone locally?

platisd commented 5 months ago

Not sure what you just asked. :grin: If you'd like to try it out locally then you'll need to clone install the dependencies and follow the example instructions.

If you'd like to integrate it into your GitHub CI pipelines, then you'll need to set up a workflow as the one described here.

schunlee commented 5 months ago

Not sure what you just asked. 😁 If you'd like to try it out locally then you'll need to clone install the dependencies and follow the example instructions.

If you'd like to integrate it into your GitHub CI pipelines, then you'll need to set up a workflow as the one described here.

I mean no need checkout step? actually I want to check code across multiple repositories.

platisd commented 5 months ago

The action will checkout the current branch for you if that's what you are asking. The exact steps can be seen here.

schunlee commented 5 months ago

The action will checkout the current branch for you if that's what you are asking. The exact steps can be seen here.

thanks for your patient and professional.

platisd commented 5 months ago

Happy to help. :+1: