taishi-i / nagisa

A Japanese tokenizer based on recurrent neural networks
https://huggingface.co/spaces/taishi-i/nagisa-demo
MIT License
391 stars 22 forks source link

Returning a generator instead of a list in nagisa.postagging #11

Closed BLKSerene closed 5 years ago

BLKSerene commented 5 years ago

Hi, I'm trying to figure out how to POS-tag a list of tokens that have already been tokenized and I found #8 , which works fine.

And I think that returning a generator instead of a list would be better for users, since it will create a long list of POS tags in-memory for a large input text. And in most cases, the returned POS-tags are to be iterated over (usually only once) to be zipped with the tokens.

Or, you could provide two functions, like postagging and lpostagging, the former one returning a generator and the latter one returning a common list.

taishi-i commented 5 years ago

Hi BLKSerene,

Thank you for your advice. I'm trying to implement a generator for returning POS-tags a list of tokens that have already been tokenized. Please wait for a week to complete it.

taishi-i commented 5 years ago

I'm sorry I took the time to implement to fix this issue. I solved this problem to use @property in tagger.py as like a generator. https://github.com/taishi-i/nagisa/blob/07de25fd2ca691fb9b98362808f10a01b37e53ef/nagisa/tagger.py#L245-L249

By adopting this method, I changed to a specification that can not be put a list of POS tags into in-memory immediately. You can use this as nagisa.tagging() function in v0.2.3.

Thank you.

BLKSerene commented 5 years ago

Thanks!