minimalparts / nonce2vec

Incremental learning of word embeddings with context informativeness.
MIT License
95 stars 15 forks source link

the sentence separator @@ is removed in the provided train and test files #1

Closed willanxywc closed 6 years ago

willanxywc commented 6 years ago

I notice in the code

 for s in fields[1].split("@@"):
     sentences.append(s.split(' '))

which means @@ should be in the original train and test corpus, however can't be found here. Is your processed version with sentence separators available? It's necessary for reproduction and further experiments. BTW I don't mean the raw files provided by Lazaridou.

Thanks~

minimalparts commented 6 years ago

You're completely right. We had the wrong version of the chimera data in the repo. Apologies for this. I have now uploaded the correct files to the data/chimeras directory.

Thanks for spotting this!

willanxywc commented 6 years ago

Thanks