Closed Tesfamariam closed 8 years ago
The issue tracker is for bugs/feature-requests, not support questions – those are better handled at the project discussion list: https://groups.google.com/forum/#!forum/gensim
And, you'd have to provide a lot more context/code/logging-info for us to have any idea what line of your code is triggering that error. So if you ask on the list, please better describe what you're trying to accomplish, and how.
Sounds like a bug report for the DTM wrapper in gensim... but a very incomplete one.
@Tesfamariam, please review the contributing guide. Add relevant information so we know what you're talking about.
Sorry for the incomplete information! Sample dataset: ['lecture', 'notes', 'edited', 'goos', 'hartmanis', 'van', 'leeuwen', 'berlin', 'heidelberg', 'york', 'barcelona', 'hong', 'kong', 'london', 'milan', 'paris', 'singapore', 'tokyo', 'vassil', 'alexandrov', 'jack', 'dongarra', 'benjoe', 'juliano', 'renner', 'kenneth', 'tan', 'eds', 'san', 'francisco', 'usa', 'proceedings', 'volume', 'editors', 'vassil', 'alexandrov', 'university', 'reading', 'school', 'cybernetics', 'electronic', 'engineering', 'whiteknights', 'box', 'reading', 'mail', 'alexandrov', 'rdg', 'jack', 'dongarra'] Then I feed the whole dataset to: class DTMcorpus(corpora.textcorpus.TextCorpus):
def get_texts(self):
return self.input
def __len__(self):
return len(self.input)
corpus = DTMcorpus(texts) Then determined the time slices: my_timeslices = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1,1,1, 1, 1, 1] model = gensim.models.wrappers.DtmModel('/media/tesfish/data/Topic Modeling/dtm-master/bin/dtm-linux64', corpus, my_timeslices, num_topics=15, id2word=dictionary_text, initialize_lda=True) finally I got the following error: TypeError Traceback (most recent call last)
Ping @bhargavvader
@bhargavvader Do you have any thoughts on this?
@tmylk will have a look.
Just a +1 -- also having this error.
with utils.smart_open(self.ftimeslices(), 'wb') as fout:
to
with utils.smart_open(self.ftimeslices(), 'w') as fout:
@boomsbloom that is not a good idea as w
mode behaves differently on Windows.
Proper solution is to open in binary mode and store binary strings.
@piskvorky , could you elaborate a bit on your proposed solution? I tried poking around but am not too sure how to fix this.
I meant simply opening files in binary mode (rb
or wb
) and then storing binary strings into it. So, if the input is unicode, convert to e.g. utf8 (see gensim.utils.to_utf8()
).
I am not familiar with this particular issue though, maybe it's something different. What is the actual problem, why are we storing unicode strings into binary files in this wrapper?
Ping @bhargavvader
@Tesfamariam , do have a look at the PR, it will fix the problem. I think this issue can be closed now.
Fixed in #768
Nice blog to address the issu https://webkul.com/blog/string-and-bytes-conversion-in-python3-x/
I am trying to implement dynamic topic modeling with python Anaconda 3.4 distribution on Linux OS.However, I am having the following error: TypeError: a bytes-like object is required, not 'str' Any idea how I could solve this problem?