Does anyone encounter problem with the tagging tool?

s4zong / extract_COVID19_events_from_Twitter

Annotated corpus and code for "Extracting COVID-19 Events from Twitter".

GNU General Public License v3.0

46 stars 17 forks source link

Does anyone encounter problem with the tagging tool? #13

Closed wangcongcong123 closed 4 years ago

wangcongcong123 commented 4 years ago

I tried this tagging tool as specified in this repository. After testing it on Mac, Windows, and Linux, the following issue still raises.

"module html.parser not found"

I searched solutions to this issue online, none can properly help fix this issue.

does anyone encounter the same issue?

A following-up question related to this is that why the tagging process is needed now that the char-chunks offsets are already in the datatset?

akhilavh commented 4 years ago

Yes. I too have faced this issue. Couldn't get any lead for solving this issue from online.

On Wed, Aug 19, 2020, 6:35 PM Congcong Wang notifications@github.com wrote:

I tried ![this tagging tool](from https://github.com/aritter/twitter_nlp) as specifed in this repository. After testing it on Mac, Windows, and Linux, the following still raises.

"module html.parser not found"

I searched solutions to this issue online, none can properly help fix this issue.

does anyone encounter the same issue?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/viczong/extract_COVID19_events_from_Twitter/issues/13, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGM64F25LZPJB7T4TCIRSCTSBPE2NANCNFSM4QE5MZTA .

s4zong commented 4 years ago

Oh really? I have been using it for a very long time. I will take a look at this.

s4zong commented 4 years ago

A following-up question related to this is that why the tagging process is needed now that the char-chunks offsets are already in the datatset?

The input of the model should be the tokenized text.

s4zong commented 4 years ago

@wangcongcong123 Can you provide more information for this issue (e.g., screenshot for this matter, py version, linux environment, etc.)? It looks all good at my side.

wangcongcong123 commented 4 years ago

Below is the full error message. I use python 2.7

Traceback (most recent call last): File "python/ner/extractEntities2_json.py", line 26, in import twokenize File "/home/congcong/venv27/lib/python2.7/site-packages/twokenize/init.py", line 1, in from .twokenize import * File "/home/congcong/venv27/lib/python2.7/site-packages/twokenize/twokenize.py", line 27, in import html.parser as HTMLParser ImportError: No module named html.parser

My Linux environment is: Linux xxx 5.4.0-42-generic #46~18.04.1-Ubuntu SMP Fri Jul 10 07:21:24 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

s4zong commented 4 years ago

So it seems it is to the version issue of "twokenize". Actually there is a twokenize.py in tagging's repo. Maybe you could try to use that instead of the pre-installed version. And the tagging tool seems not using "html.parser".

Supermaxman commented 4 years ago

I solved this my making sure I ran the bash build.sh in the tagging repo and not pip installing twokenize separately, as it is some other package built for python 3. Make sure twokenize is using the tagging repo's tokenizing code and not some other pip package.

akhilavh commented 4 years ago

Can you please explain.. I had run the bash in tagging repo. How to add the ner to our working code.

On Sat, Aug 29, 2020, 7:29 PM Max Weinzierl notifications@github.com wrote:

I solved this my making sure I ran the bash build.sh in the tagging repo and not pip installing twokenize separately, as it is some other package built for python 3. Make sure twokenize is using the tagging repo's tokenizing code and not some other pip package.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/viczong/extract_COVID19_events_from_Twitter/issues/13#issuecomment-683294132, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGM64F6UC3TXLU6KKEEBHGDSDECSTANCNFSM4QE5MZTA .