platisd / duplicate-code-detection-tool

A simple Python3 tool to detect similarities between files within a repository
MIT License
162 stars 30 forks source link

feature request: update comments for different detections #32

Closed shinyano closed 3 months ago

shinyano commented 4 months ago

I use the tool for different parts seperately in my project. My script looks like below and it checks 3 parts in my code:

name: "Duplicate code"

on:
  issue_comment:
    types:
      - created
permissions:
  contents: read
  pull-requests: write

jobs:
  duplicate-code-check:
    name: Check for duplicate code
    runs-on: ubuntu-latest
    if: github.event.issue.pull_request && contains(github.event.comment.body, 'run_duplicate_code_detection')
    steps:
      - name: Check for duplicate code(core)
        uses: platisd/duplicate-code-detection-tool@master
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          directories: "core"
          file_extensions: "java, py"
          one_comment: true

      - name: Check for duplicate code(datasource)
        if: always()
        uses: platisd/duplicate-code-detection-tool@master
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          directories: "dataSources"
          file_extensions: "java, py"
          one_comment: true

      - name: Check for duplicate code(session, shared)
        if: always()
        uses: platisd/duplicate-code-detection-tool@master
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          directories: "session, shared"
          file_extensions: "java, py"
          one_comment: true

I would like having one comment in PR for each part (that is 3 comments in total). And when the script runs again, it chooses to update these 3 comments.

For now, it can only update the latest comment. So it would make one comment and edit it twice. But if I choose to create comment every time, there will be too many comments.

platisd commented 4 months ago

Interesting way of using the tool, I did not anticipate that. :sweat_smile:

To achieve what you describe, you would have to (as a user) provide some unique identifier yourself when invoking the action. This unique identifier will need to be part of the message so that the right message can be picked up. Not sure how the "UX" should be yet...

Maybe something like:

      - name: Check for duplicate code(datasource)
        if: always()
        uses: platisd/duplicate-code-detection-tool@master
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          directories: "dataSources"
          file_extensions: "java, py"
          one_comment: true
          unique_header_message_start: "## šŸ“Œ Duplicate code detection tool report (datasource)"

      - name: Check for duplicate code(session, shared)
        if: always()
        uses: platisd/duplicate-code-detection-tool@master
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          directories: "session, shared"
          file_extensions: "java, py"
          one_comment: true
          unique_header_message_start: "## šŸ“Œ Duplicate code detection tool report (session, shared)"

If the user doesn't actually provide an actually unique unique_header_message_start then things will not work correctly. Would that look like a good way of working for you? Or do you have any other suggestions of how this should be used?

platisd commented 4 months ago

Or alternatively some unique_id that will be placed after the default message within parentheses, similar to how I have it above but the user would only have to provide what's within the parentheses and not the entire thing.

shinyano commented 4 months ago

Yeah I think providing a special message header is a good option. I'm considering generating unique message header automatically using target directories. When user set:

directories: "target"

Maybe the message header can be generated as ## šŸ“Œ Duplicate code detection tool report (target)

platisd commented 4 months ago

Do you mean that the action should generate these identifiers without the users intervention?

shinyano commented 4 months ago

Yes, I think that's more convenient and easier.

platisd commented 4 months ago

It is inded easier, however, only for this particular use case. I am thinking that having the specific directory names in the title works well f they are few, but if there would be let's say 10 folders there, the title would get huge and things would look ugly. :thinking:

shinyano commented 4 months ago

Yeah that's a problem... Do you think letting the user provide message header is a better idea? Like what you've shown earlier.

shinyano commented 4 months ago

Also, do you think it will be a good and practical idea to ignore short files? I got a lot of short Java interface classes highly similar to each other because the class is too short. For example:

public interface Function {

  FunctionType getFunctionType();

  MappingType getMappingType();

  String getIdentifier();
}

and

public interface RowMappingFunction extends Function {

  Row transform(Row row, FunctionParams params) throws Exception;
}

They appear to be 86.97% similar(with package and import, no more than 2 lines)

platisd commented 4 months ago

In some cases I guess it would make sense. However the user would have to opt-in to enable such a feature (i.e. ignoring of short files). In other words, it shouldn't be on by default.

shinyano commented 4 months ago

That's an option yeah.

platisd commented 3 months ago

Can you try out the branch in #33?

      - name: Check for duplicate code(datasource)
        if: always()
        uses: platisd/duplicate-code-detection-tool@user_configurable_message_header
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          directories: "dataSources"
          file_extensions: "java, py"
          one_comment: true
          unique_header_message_start: "## šŸ“Œ Duplicate code detection tool report (datasource)"

      - name: Check for duplicate code(session, shared)
        if: always()
        uses: platisd/duplicate-code-detection-tool@user_configurable_message_header
        with:
          github_token: ${{ secrets.GITHUB_TOKEN }}
          directories: "session, shared"
          file_extensions: "java, py"
          one_comment: true
          unique_header_message_start: "## šŸ“Œ Duplicate code detection tool report (session, shared)"
shinyano commented 2 months ago

It's working exactly as expected! Thank you so much for your work and big apology for the late late reply<3