ponder-lab / GitHub-Issue-Classifier

Python script to mine for GitHub issues + comments and classify them.
MIT License
6 stars 0 forks source link

Filter out Dependabot issues #48

Closed tatianacv closed 3 years ago

tatianacv commented 3 years ago

The script is picking up Dependabot pull requests (GH API picks up open pull requests as issues) because tf.function appears in the Release Notes. Please filter out these pull requests.

khatchad commented 3 years ago

Can you provide an example?

tatianacv commented 3 years ago

Yes, of course. For example, from your results we get issues that are Dependabots who bump the version but do not have relevant information for our study. For example, https://github.com/RuiLiFeng/flow-gan/pull/4 which is issue 761587226, we have classifications of solution discussion, social conversation, usage, bug reproduction, potential new issues and requests, and action on issue. So we consider this issue as relevant given those classifications, but in reality is not. These classifications show up because of the release notes in the issue description.

y3pio commented 3 years ago

Hmm interesting, I feel like this line should have caught it: https://github.com/ponder-lab/GitHub-Issue-Classifier/blob/main/utils/githubAPI.py#L73

Will look into this.

tatianacv commented 3 years ago

I think the user is 'dependabot' not 'Bot' that might be what is happening.

y3pio commented 3 years ago

I'm checking for the user type, and from what the API is returning it looks like it's properly labeled as Bot: https://api.github.com/repos/RuiLiFeng/flow-gan/issues/4

Not sure why this would show up in the results, maybe I'll look into filtering out user.login === "dependabot[bot]" as well. Going to look deeper into this. Will keep you posted.

tatianacv commented 3 years ago

Perfect, thank you!

y3pio commented 3 years ago

So it turns out the original issue with https://github.com/RuiLiFeng/flow-gan/pull/4 being included in the result was because we were considering the first issue comment/description into the list of comments to be classified, but that was done separately outside of our regular check which filters out bot comments.

https://github.com/ponder-lab/GitHub-Issue-Classifier/pull/50 PR to remove comment description should have fixed this issue, but I am also further checking for dependabot's account in the login field to further make sure these types of comments don't get classified in the future.