wardbradt / HTMLST

A library to extract sentences from HTML
MIT License
12 stars 4 forks source link

Error message while calling HTMLSentenceTokenizer #1

Closed cli0 closed 4 years ago

cli0 commented 5 years ago

While trying to run your example code I get the error:

Traceback (most recent call last):
  File "extract.py", line 4, in <module>
    parsed_sentences = HTMLSentenceTokenizer().feed(example_html_one)
TypeError: 'module' object is not callable

And it ultimately stems from this:

HTMLSentenceTokenizer' is not callable

that I get directly from the IDE (Pycharm).

conorosully commented 5 years ago

I have this same error? Do y'all know what is going on?

conorosully commented 5 years ago

The following workaround worked for me:

import HTMLSentenceTokenizer sentence = HTMLSentenceTokenizer.HTMLSentenceTokenizer() example_html_one = open('example_html_one.html', 'r').read() parsed_sentences = sentence.feed(example_html_one) print(parsed_sentences)

gevezex commented 4 years ago

Without the download method of @conorosully:

from htmlst import HTMLSentenceTokenizer
sentence = HTMLSentenceTokenizer.HTMLSentenceTokenizer()
example_html_one = open('example_html_one.html', 'r').read()
parsed_sentences = sentence.feed(example_html_one)
print(parsed_sentences)
acrabb commented 4 years ago

any other users seeing this tool missing large chunks of text? looks great though :)