ratsgo / embedding

한국어 임베딩 (Sentence Embeddings Using Korean Corpora)
https://ratsgo.github.io/embedding
MIT License
452 stars 129 forks source link

2쇄 기준 설명 (81p문의) #135

Closed keemyo closed 2 years ago

keemyo commented 2 years ago

from gensim.corpora import WikiCorpus from gensim.utils import to_unicode

in_f = "/notebooks/embedding/data/raw/kowiki-latest-pages-articles.xml.bz2" out_f = "/notebooks/embedding/data/processed/processed_wiki_ko.txt" output = open(out_f, 'w')

이렇게 한 뒤

wiki = WikiCorpus(in_f, tokenizr_func = tokenize)를 하게 되면 tokenize가 정의되지 않았다라고 뜨는데

Q1.

Q2.

ㅠㅠㅠㅠㅠ

ratsgo commented 2 years ago

@keemyo 님 다음 이슈에 논의된 내용이 도움이 되실 것 같습니다!

https://github.com/ratsgo/embedding/issues/62