pommedeterresautee / fastrtext

R wrapper for fastText
https://pommedeterresautee.github.io/fastrtext/
Other
101 stars 15 forks source link

Add convenience functions for training models #28

Closed olsgaard closed 5 years ago

olsgaard commented 6 years ago

As discussed in #26, here is my take on adding functions to fastrtext to make it simpler for a user to train a model.

These functions essentially give the user access to see all available arguments of fasttext from within R-studio, as well as writing documents to a tempfile with proper labeling and calling the execute statement.

It would be nice if we could somehow stream the text directly into fasttext instead of building a tempfile, but I don't know how to do that.

pommedeterresautee commented 6 years ago

@olsgaard tks for your contribution. Can you fix the CI? seems related to issues in the doc

olsgaard commented 6 years ago

Everything builds on my end without errors or warnings. However, Travis times-out on the code coverage check, causing the build to fail. I really don't know what causes the code coverage to take so long.

Output of running "Check" in R-studio:

R CMD check results
0 errors | 0 warnings | 1 note 
checking top-level files ... NOTE
Non-standard files/directories found at top level:
  'my_model.bin' 'my_model.vec'

R CMD check succeeded
olsgaard commented 5 years ago

I tried adding more time to the code coverage, which made travis succeed on my fork [1]. I don't understand why it timed out after 50 minutes on your branch. It should have allowed for over 2 hours for running the code coverage.

[1] https://travis-ci.org/olsgaard/fastrtext/jobs/427340587

pommedeterresautee commented 5 years ago

Hi, I added your code to the last version of fastrtext manually. I took too much time to acccept your PR and I needed to make some modification as your current PR is not up to date with the source code. It has been released by Cran. Thank you for your code and sorry for not being able to validate your PR before.

olsgaard commented 5 years ago

That's great news! Thank you for including it.

How did you fix the timeout issue?

pommedeterresautee commented 5 years ago

Long time ago, Cran forced us to have only 1 thread, so I blocked the test code to 1 thread. But now it seems to be removed. In the test there are several trains... that was the issue... So now tests are done with all Cores included. FYI I updated the C++ code to the last version of fasttext, there was several things I had to reimplement in the past because facebook code was doing the work and printing the results. New facebook code is of better quality, separating the real work from printing, etc. So I removed my own reimplementation of some stuffs. I didn't bind to all their new functions as I don t think they are super useful in real life (like getting subwords vectors) but if you have some ideas... I would react more rapidly than I did in the past :-)