ponder-lab / GitHub-Issue-Classifier

Python script to mine for GitHub issues + comments and classify them.
MIT License
6 stars 0 forks source link

Undefined/index out of bound error with the comment line when trying to tokenize QUOTE #32

Closed y3pio closed 3 years ago

y3pio commented 3 years ago

Below is the trace for the error:

[COMMENTS URL GET]: https://api.github.com/repos/tensorflow/tensorflow/issues/39645/comments
Traceback (most recent call last):
  File "/Users/helloye/dev/CSCI_FinalProject/GitHub-Issue-Mining/main.py", line 174, in <module>
    CORPUS += gitHubCommentAPI(issues_api_urls)
  File "/Users/helloye/dev/CSCI_FinalProject/GitHub-Issue-Mining/utils/githubAPI.py", line 81, in gitHubCommentAPI
    "commentLine": processComment(line),
  File "/Users/helloye/dev/CSCI_FinalProject/GitHub-Issue-Mining/utils/commentProcessor.py", line 58, in processComment
    if(str(parsed_line[0]) == ">"):
  File "spacy/tokens/doc.pyx", line 461, in spacy.tokens.doc.Doc.__getitem__
  File "spacy/tokens/token.pxd", line 23, in spacy.tokens.token.Token.cinit
IndexError: [E040] Attempt to access token at 0, max length 0.

When fetching comments for https://api.github.com/repos/tensorflow/tensorflow/issues/39645/comments, and processing/tokenizing them, it seems we are trying splitlines() and trying to access the first word to check if it's a comment line, we are getting undefined/index error.

Comment HTML URL: https://github.com/tensorflow/tensorflow/issues/39645#issuecomment-693799608