minqi / hnatt

Train and visualize Hierarchical Attention Networks
MIT License
202 stars 35 forks source link

TypeError: Argument 'string' has incorrect type (expected unicode, got str) #3

Open kcsmta opened 6 years ago

kcsmta commented 6 years ago

I encountered an error. Does anyone has any suggestions? Please, and Thanks a lot!!! Using TensorFlow backend. loading Yelp reviews... 0%| | 0/10000 [00:00<?, ?it/s]Traceback (most recent call last): File "main.py", line 10, in <module> (train_x, train_y), (test_x, test_y) = yelp.load_data(path=YELP_DATA_PATH, size=1e4, binary=False) File "/home/khanhng/Downloads/hnatt-master/util/yelp.py", line 48, in load_data df['text_tokens'] = df['text'].progress_apply(lambda x: normalize(x)) File "/home/khanhng/Downloads/hnatt-master/.venv/local/lib/python2.7/site-packages/tqdm/_tqdm.py", line 612, in inner result = getattr(df, df_function)(wrapper, **kwargs) File "/home/khanhng/Downloads/hnatt-master/.venv/local/lib/python2.7/site-packages/pandas/core/series.py", line 3194, in apply mapped = lib.map_infer(values, f, convert=convert_dtype) File "pandas/_libs/src/inference.pyx", line 1472, in pandas._libs.lib.map_infer File "/home/khanhng/Downloads/hnatt-master/.venv/local/lib/python2.7/site-packages/tqdm/_tqdm.py", line 608, in wrapper return func(*args, **kwargs) File "/home/khanhng/Downloads/hnatt-master/util/yelp.py", line 48, in <lambda> df['text_tokens'] = df['text'].progress_apply(lambda x: normalize(x)) File "/home/khanhng/Downloads/hnatt-master/util/text_util.py", line 11, in normalize doc = nlp(text) File "/home/khanhng/Downloads/hnatt-master/.venv/local/lib/python2.7/site-packages/spacy/language.py", line 346, in __call__ doc = self.make_doc(text) File "/home/khanhng/Downloads/hnatt-master/.venv/local/lib/python2.7/site-packages/spacy/language.py", line 378, in make_doc return self.tokenizer(text) TypeError: Argument 'string' has incorrect type (expected unicode, got str) Exception KeyError: KeyError(<weakref at 0x7f1109825f70; to 'tqdm' at 0x7f111a0d7490>,) in <bound method tqdm.__del__ of 0%| | 1/10000 [00:00<11:54, 13.99it/s]> ignored

zhuzhikui commented 5 years ago

Hello,have you solve the problem?

sachinh35 commented 5 years ago

replace the line util/text_util.py . Original line doc = nlp(text)

Replaced line:- doc = nlp(text.decode('utf8'))