Open SundareshPrasanna opened 6 years ago
You may need to add a t.decode('utf-8') like this:
sent_detector = nltk.data.load('tokenizers/punkt/english.pickle')
for t in text:
sents = sent_detector.tokenize(t.decode('utf-8'))
Thanks @ErikOmberg that solved that issue.. but i am faced with a new error in search.py
Traceback (most recent call last):
File "
on trying to print ti values, i get the following: [ 7.65000000e-03 2.50000000e-04 1.45000000e-03 1.70000000e-03 6.45000000e-03 5.85000000e-03 5.16500000e-02 7.80000000e-03 2.60000000e-03 2.65000000e-03 2.70500000e-02 2.15000000e-03 6.00000000e-03 1.94000000e-02 6.10000000e-03 6.49500000e-02 6.42500000e-02 1.79650000e-01 1.56650000e-01 1.04500000e-02 4.28400000e-01 5.38000000e-02 9.15000000e-03 1.93400000e-01 3.92000000e-02 1.71300000e-01 7.10000000e-03 1.25900000e-01 8.68500000e-02 3.80000000e-03 1.88450000e-01 9.03500000e-02 1.37500000e-02 7.23500000e-02 1.84850000e-01 1.14850000e-01 3.13450000e-01 9.05000000e-03 1.20050000e-01 2.30500000e-01 3.24000000e-02 8.71800000e-01 4.88000000e-02 3.75000000e-02 1.22250000e-01 1.01950000e-01 4.62350000e-01 2.75200000e-01 1.11000000e-02 4.95000000e-03] so the ti values are indeed float64 values and cannot be used to index the hyp_sample variable. Any idea how i can correct this? am i missing something?
I think I had to add this f2 thing that casts. I'm not entirely sure it's "correct", so don't entirely trust it.
f2 = numpy.vectorize(numpy.int)
trans_indices = f2(ranks_flat / voc_size)
word_indices = ranks_flat % voc_size
costs = cand_flat[ranks_flat]
new_hyp_samples = []
new_hyp_scores = numpy.zeros(k-dead_k).astype('float32')
new_hyp_states = []
for idx, [ti, wi] in enumerate(zip(trans_indices, word_indices)):
new_hyp_samples.append(hyp_samples[ti]+[wi])
Thanks @ErikOmberg that solved the error. But the code now gives out almost similar output irrespective of the image i feed in
for example: if i feed image ex3 (which i think is about french fries & ketchup) in the images folder, i get,
generate.story(z, './images/ex3.jpg') NEAREST-CAPTIONS: b'Person snowboarding down a hill .' b'some zebras dirt plants and bushes' b'People are crossing the street in NYC' b'Black bear is hiding behind tall leaves' b'two people on a field playing lacrosse' OUTPUT: I had a feeling it was never going to be a waste of time , but I could n't keep quiet until she looked at me as I closed my eyes and watched her lean back against him as he pulled out his cell phone . The only thing that mattered to me was that the two of them were just a few minutes away . We were only a few short minutes after the charity event , and I had no idea what to do . By the time I opened the door , I thought I was going to walk through the streets of New York City . She was dressed in a nice , white dress with a gun pointed at the collar of her shirt , and I could see no trace of it . In fact , my mom 's screams and screams filled her ears . My God , he said .
I also get the same nearest caption and output when i feed it a tennis image,
generate.story(z, './images/tennis.jpg')
NEAREST-CAPTIONS: b'Person snowboarding down a hill .' b'some zebras dirt plants and bushes' b'People are crossing the street in NYC' b'Black bear is hiding behind tall leaves' b'two people on a field playing lacrosse' OUTPUT: I had a feeling it was never going to be a waste of time , but I could n't keep quiet until she looked at me as I closed my eyes and watched her lean back against him as he pulled out his cell phone . The only thing that mattered to me was that the two of them were just a few minutes away . We were only a few short minutes after the charity event , and I had no idea what to do . By the time I opened the door , I thought I was going to walk through the streets of New York City . She was dressed in a nice , white dress with a gun pointed at the collar of her shirt , and I could see no trace of it . In fact , my mom 's screams and screams filled her ears . My God , he said .
Yes, that's what I observe too. Please let me know if you find how to get quality captions.
on running generate.story(z, './images/ex2.jpg') which is image of a flower, i get random nearest neighbor captions followed by an error as below:
Traceback (most recent call last): File "", line 1, in
File "C:\Users\Dell\Documents\Neural_storyteller\neural-storyteller-master\generate.py", line 59, in story
print('')
File "C:\Users\Dell\Documents\Neural_storyteller\neural-storyteller-master\skipthoughts.py", line 84, in encode
X = preprocess(X)
File "C:\Users\Dell\Documents\Neural_storyteller\neural-storyteller-master\skipthoughts.py", line 149, in preprocess
sents = sent_detector.tokenize(t)
File "C:\Users\Dell\Anaconda3\envs\tensorflow-gpu\lib\site-packages\nltk\tokenize\punkt.py", line 1237, in tokenize
return list(self.sentences_from_text(text, realign_boundaries))
File "C:\Users\Dell\Anaconda3\envs\tensorflow-gpu\lib\site-packages\nltk\tokenize\punkt.py", line 1285, in sentences_from_text
return [text[s:e] for s, e in self.span_tokenize(text, realign_boundaries)]
File "C:\Users\Dell\Anaconda3\envs\tensorflow-gpu\lib\site-packages\nltk\tokenize\punkt.py", line 1276, in span_tokenize
return [(sl.start, sl.stop) for sl in slices]
File "C:\Users\Dell\Anaconda3\envs\tensorflow-gpu\lib\site-packages\nltk\tokenize\punkt.py", line 1276, in
return [(sl.start, sl.stop) for sl in slices]
File "C:\Users\Dell\Anaconda3\envs\tensorflow-gpu\lib\site-packages\nltk\tokenize\punkt.py", line 1316, in _realign_boundaries
for sl1, sl2 in _pair_iter(slices):
File "C:\Users\Dell\Anaconda3\envs\tensorflow-gpu\lib\site-packages\nltk\tokenize\punkt.py", line 312, in _pair_iter
prev = next(it)
File "C:\Users\Dell\Anaconda3\envs\tensorflow-gpu\lib\site-packages\nltk\tokenize\punkt.py", line 1289, in _slices_from_text
for match in self._lang_vars.period_context_re().finditer(text):
TypeError: cannot use a string pattern on a bytes-like object