machinalis / yalign

A sentence aligner for comparable corpora
Other
127 stars 31 forks source link

Example not working #6

Open pstawinski opened 8 years ago

pstawinski commented 8 years ago

Hi. I'm trying to use yalign and to start I read the docs and did run: sudo pip install yalign Followed by wget https://raw.githubusercontent.com/machinalis/yalign/develop/data/models/0.1/en-es.tar.gz tar -xvzf en-es.tar.gz And at last: yalign-align en-es http://en.wikipedia.org/wiki/Antiparticle http://es.wikipedia.org/wiki/Antipart%C3%ADcula And there was nothing written on the stdout. Is it expected?

If it's important: I'm using python 2.7.9 on Debian.

Thank you for your time.

j0hn commented 8 years ago

Hey, I've been digging a little bit into this.

First of all I've noticed that we don't have a fixed version of Scikit-Learn. This means it uses the most recent one and we might be outdated on some things.

That means that the trained model that we provide, could or could not work with the Scikit-Learn version that you're using. I did a clean installation and it didn't work. If this is the case, re-building the model is necessary. Fortunately there's a tutorial on how to generate a model.

After that you'll run into another issue. Apparently the code that downloads the data from an URL isn't following redirections, and Wikipedia is now redirecting all http content to the https. This means that you could try the https url to avoid that problem. Another option would be to use plain text files instead of urls.

Even after all that, you might encounter another issue. I've seen the tokenizer code on the project and it might be outdated. Wasn't working on my clean installation but i didn't had time enough to debug it. If this is the case, i've prepared a branch with some fixes that got it running for me. To use that version instead of the one on PyPi, you'll have to do this:

Remove your version: pip uninstall yalign

Get the code: git clone -b issue-6-empty-response https://github.com/machinalis/yalign.git

Install this version of the code: pip install -e yalign

Hope this gets you anywhere near your objective. Let us know how it goes.

simontite-capita-ti commented 8 years ago

Worked for me. Many thanks!

LukeTu commented 4 years ago

Hey, I've been digging a little bit into this.

First of all I've noticed that we don't have a fixed version of Scikit-Learn. This means it uses the most recent one and we might be outdated on some things.

That means that the trained model that we provide, could or could not work with the Scikit-Learn version that you're using. I did a clean installation and it didn't work. If this is the case, re-building the model is necessary. Fortunately there's a tutorial on how to generate a model.

After that you'll run into another issue. Apparently the code that downloads the data from an URL isn't following redirections, and Wikipedia is now redirecting all http content to the https. This means that you could try the https url to avoid that problem. Another option would be to use plain text files instead of urls.

Even after all that, you might encounter another issue. I've seen the tokenizer code on the project and it might be outdated. Wasn't working on my clean installation but i didn't had time enough to debug it. If this is the case, i've prepared a branch with some fixes that got it running for me. To use that version instead of the one on PyPi, you'll have to do this:

Remove your version: pip uninstall yalign

Get the code: git clone -b issue-6-empty-response https://github.com/machinalis/yalign.git

Install this version of the code: pip install -e yalign

Hope this gets you anywhere near your objective. Let us know how it goes.

Hi there! This is what I got when I ran the exactly same code on Debian 9. Could anyone tell me what went wrong here? Many thanks! Successfully installed scikit-learn-0.17.1 root@instance-4:~# yalign-align en-es https://en.wikipedia.org/wiki/Antiparticle https://es.wikipedia.org/wiki/Antipa rt%C3%ADcula /usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample. DeprecationWarning) Traceback (most recent call last): File "/usr/local/bin/yalign-align", line 7, in <module> exec(compile(f.read(), __file__, 'exec')) File "/root/yalign/scripts/yalign-align", line 66, in <module> pairs = model.align(document_a, document_b) File "/root/yalign/yalign/yalignmodel.py", line 130, in align alignments = self.align_indexes(document_a, document_b) File "/root/yalign/yalign/yalignmodel.py", line 138, in align_indexes alignments = self.document_pair_aligner(document_a, document_b) File "/root/yalign/yalign/sequencealigner.py", line 34, in __call__ node = astar(problem, graph_search=True) File "/usr/local/lib/python2.7/dist-packages/simpleai/search/traditional.py", line 121, in astar viewer=viewer) File "/usr/local/lib/python2.7/dist-packages/simpleai/search/traditional.py", line 156, in _search expanded = node.expand() File "/usr/local/lib/python2.7/dist-packages/simpleai/search/models.py", line 105, in expand for action in self.problem.actions(self.state): File "/root/yalign/yalign/sequencealigner.py", line 68, in actions w = self.W(a, b) File "/root/yalign/yalign/sentencepairscore.py", line 57, in __call__ score = self.classifier.score(a) * self.sign File "/root/yalign/yalign/svm.py", line 51, in score return float(self.svm.decision_function(vector)) File "/usr/local/lib/python2.7/dist-packages/sklearn/svm/base.py", line 542, in decision_function dec = self._decision_function(X) File "/usr/local/lib/python2.7/dist-packages/sklearn/svm/base.py", line 405, in _decision_function dec_func = self._dense_decision_function(X) File "/usr/local/lib/python2.7/dist-packages/sklearn/svm/base.py", line 423, in _dense_decision_function self._dual_coef_, self._intercept_, AttributeError: 'SVC' object has no attribute '_dual_coef_'