vinhkhuc / JFastText

Java interface for fastText
Other
228 stars 100 forks source link

Add Java binding for `getSentenceVector` #50

Open bxshi opened 5 years ago

bxshi commented 5 years ago

This PR adds Java binding for the getSentenceVector. This method can return subword-based embeddings for OOV words. Comparing to getWordVector, even if the input for getSentenceVector is OOV, it still can compute the embeddings based on in-vocab subwords.

I also modified the test cases slightly to test output embeddings for OOV words.

This is a useful method in my use case, so I'm submitting a PR in case others also want this. Feel free to comment. Thanks!

carschno commented 5 years ago

I am trying to merge all the pull requests on my fork. @bx If you like, please add a pull request there. However, I noticed that a check has failed.

bxshi commented 5 years ago

Hi @carschno, base on the CI error it seems that the CI environment trying to use java8 whereas the system only supports 9 to 13.

Installing oraclejdk8
$ export JAVA_HOME=~/oraclejdk8
$ export PATH="$JAVA_HOME/bin:$PATH"
$ ~/bin/install-jdk.sh --target "/Users/travis/oraclejdk8" --workspace "/Users/travis/.cache/install-jdk" --feature "8" --license "BCL"
install-jdk.sh 2019-01-18 II
Expected feature release number in range of 9 to 13, but got: 8
The command "~/bin/install-jdk.sh --target "/Users/travis/oraclejdk8" --workspace "/Users/travis/.cache/install-jdk" --feature "8" --license "BCL"" failed and exited with 3 during .
bxshi commented 5 years ago

Hi @carschno, after updating the java version in the TravisCI config, it passes all the tests. Thank you!