machinalis / yalign

A sentence aligner for comparable corpora
Other
127 stars 31 forks source link

Warning while creating model #8

Open sanjanasri opened 7 years ago

sanjanasri commented 7 years ago

I tried to create a model using yalign-issue6-response package and I am getting the following warning.

DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.17. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample

Though i got aligner.pickle and metadata is created. I am attaching the file for reference.

en-es.zip

I dunno whether I can use that or not. It would be grt If I get an earnest reply

simontite-capita-ti commented 7 years ago

Yes, I get this too. It has no effect on the result of the yalign-align program, it is just a warning. What I do is, suppress the annoying message (something like this): yalign-align ......... doc_A doc_B 2>&1 > aligned-file.txt | grep -v "DeprecationWarning"

However, this is just a shell trick which sends stderr to the screen unless it contains the text "DeprecationWarning".

It would be nice if somebody could fix this (I have tried, but I don't really know enough about it to have succeeded), because presumably yalign will fail when the next version of sklearn arrives.

sanjanasri commented 7 years ago

Thanks a lot for your earnest reply. Now it got working.

I wish to develop a model for other language that has word boundary. What will be the optimal number of parallel sentences we would be needing for developing the model.

Is 15k parallel sentences a fair amount?

luoyangen commented 7 years ago

@simontite-capita-ti to the new version of sklearn(0.18.1), DeprecationWarning comes out and yalign doesn't works now. do you have any tips

simontite-capita-ti commented 7 years ago

@luoyangen I haven't tried this on the new version of sklearn yet, but this fixes the deprecation warning, so it should probably also fix the ValueError in 0.19:

In yalign/svm.py, after line 51 add the line: vector = vector.reshape(1, -1)

The whole function now looks like this:

    def score(self, data):
        """
        The score is positive for an alignment.
        """
        self._SVC_hack()
        vector = self._vectorize(data)
        vector = vector.reshape(1, -1)
        return float(self.svm.decision_function(vector))
anthobio23 commented 3 years ago

@simontite-capita-ti hi, what version do you use of sklearn for this? I have version 0.17.1 and I get the same error when training a model