roshan-research / hazm

Persian NLP Toolkit
https://www.roshan-ai.ir/hazm/
MIT License
1.19k stars 180 forks source link

Part of speech tagger does not work #319

Closed AlexandderGorodetski closed 4 months ago

AlexandderGorodetski commented 7 months ago

Hello. I have big text. I want to apply on this text part of speech tagger so that beside every word I will see part of speech tag.

I wrote following program

from hazm import Normalizer, word_tokenize, POSTagger, Stemmer, Lemmatizer

def extract_words_with_pos(farsi_text):

Normalize the Farsi text

normalizer = Normalizer()
normalized_text = normalizer.normalize(farsi_text)

# Tokenize the normalized text
tokens = word_tokenize(normalized_text)

# Initialize a POS tagger, stemmer, and lemmatizer
tagger = POSTagger(model='resources-0.5/postagger.model')
tagger.load_model()

stemmer = Stemmer()
lemmatizer = Lemmatizer()

# Extract words with their POS using stemming and lemmatization
words_with_pos = [(lemmatizer.lemmatize(stemmer.stem(token), pos), pos) for token, pos in tagger.tag(tokens) if token]

return words_with_pos

if name == "main":

Example Farsi text

farsi_text = "درختان زیبا در گلخانه ما هستند."

# Extract words with their POS from the Farsi text
words_with_pos_list = extract_words_with_pos(farsi_text)

# Print each word and its POS on a new line
for word, pos in words_with_pos_list:
    print(f"{word}: {pos}")

But I got following error

Traceback (most recent call last): File "test.py", line 28, in words_with_pos_list = extract_words_with_pos(farsi_text) File "test.py", line 12, in extract_words_with_pos tagger = POSTagger(model='resources-0.5/postagger.model') File "/opt/conda/lib/python3.8/site-packages/hazm/pos_tagger.py", line 40, in init super().init(model, data_maker) File "/opt/conda/lib/python3.8/site-packages/hazm/sequence_tagger.py", line 67, in init self.load_model(model) File "/opt/conda/lib/python3.8/site-packages/hazm/sequence_tagger.py", line 113, in load_model tagger.open(model) File "pycrfsuite/_pycrfsuite.pyx", line 571, in pycrfsuite._pycrfsuite.Tagger.open File "pycrfsuite/_pycrfsuite.pyx", line 733, in pycrfsuite._pycrfsuite.Tagger._check_model FileNotFoundError: [Errno 2] No such file or directory: 'resources-0.5/postagger.model'

Your help is more than appreciated.

sir-kokabi commented 6 months ago

Hello, You should download the latest POSTagger model and extract it into the root directory of your script.