ponder-lab / GitHub-Issue-Classifier

Python script to mine for GitHub issues + comments and classify them.
MIT License
6 stars 0 forks source link

Remove html tags from comments #49

Open tatianacv opened 3 years ago

tatianacv commented 3 years ago

In some comment lines for the issues, they appear html tags (e.g. li). Please remove these.

khatchad commented 3 years ago

Can you provide an example in the dataset?

tatianacv commented 3 years ago

I will use issue 761588798 as an example. In the line comments, we have p><em sourced URL tensorflow tensorflow releases">tensorflow releases</a>.</em></p, and p><code tf.distribute</code introduces experimental support asynchronous training kera model via URL api_docs python tf distribute experimental parameterserverstrategy?version nightly"><code tf.distribute.experimental parameterserverstrategy</code></a api please see additional details.</p. These HTML tags are unnecessary.

y3pio commented 3 years ago

Hi @tatianacv do you think you can link the issue URL as well, like you did for the other issue? I can't seem to find 761588798 on the sample classified result files that I attached to the asana tasks.

tatianacv commented 3 years ago

Of course, for that one the URL is https://github.com/animesh/deepmind-research/pull/5.

y3pio commented 3 years ago

@tatianacv can you confirm if that's the right URL? I can't seem to find the HTML code that you posted above in the issue you linked: https://github.com/animesh/deepmind-research/pull/5.

i.e Searching for strings like tensorflow releases or training kera model via URL... doesn't show any matches.

tatianacv commented 3 years ago

@y3pio You have to click on release notes, changelog and commits for the text to show up. The <code tf.distribute</code introduces experimental support asynchronous training is the first bullet point below Release notes.

y3pio commented 3 years ago

Ah I see it now, thanks! Will look into this.