Closed chuangys closed 5 years ago
Hi, can you provide some code? (one I can test, with some data)
I just tried
library(fastrtext)
data("train_sentences")
data("test_sentences")
# prepare data
tmp_file_model <- tempfile()
train_labels <- paste0("__label__", train_sentences[,"class.text"])
train_texts <- tolower(train_sentences[,"text"])
train_to_write <- paste(train_labels, train_texts)
train_tmp_file_txt <- tempfile()
writeLines(text = train_to_write, con = train_tmp_file_txt)
test_labels <- paste0("__label__", test_sentences[,"class.text"])
test_texts <- tolower(test_sentences[,"text"])
test_to_write <- paste(test_labels, test_texts)
# learn model
execute(commands = c("supervised", "-input", train_tmp_file_txt,
"-output", tmp_file_model, "-dim", 200, "-lr", 1,
"-epoch", 20, "-wordNgrams", 2, "-verbose", 1))
model <- load_model(tmp_file_model)
predict(model, sentences = test_sentences[1, "text"])
And had no issue...
Can you try -verbose 1 in your command line?
@pommedeterresautee Your code is running well at my environment. So I have to correct my problem. Apply the same example data, and I use the pre-trained vector, than can reproduce the hang at model output issue.
Source code below:
library(fastrtext) data("train_sentences") data("test_sentences") tmp_file_model <- tempfile(); print(tmp_file_model); train_labels <- paste0("label", train_sentences[,"class.text"]) train_texts <- tolower(train_sentences[,"text"]) train_to_write <- paste(train_labels, train_texts) train_tmp_file_txt <- tempfile(); print(train_tmp_file_txt); writeLines(text = train_to_write, con = train_tmp_file_txt) execute(commands = c("supervised", "-input", train_tmp_file_txt, "-output", tmp_file_model, "-dim", 300, "-lr", 1, "-epoch", 300, "-wordNgrams", 2, "-verbose", 1, "-pretrainedVectors", "e:/baproject/data/pretrainedword2vec/wiki-news-300d-1M.vec"))
The wiki-news-300d-1M.vec download from facebookresearch pre-trained vector at below website. https://fasttext.cc/docs/en/english-vectors.html
it may be related to RAM issue. Did you fixed it?
Hi - I'm having the same issue as @chuangys, it seems to hang on the larger vec file ? I have 16GB of RAM
Have you some test code? Did you checked the RAM (model trained by Facebook are quite big).
I do.
execute(commands = c("supervised", "-input", "C:/Users/xxx/R/fasttext_test/train.txt", "-output", "C:/Users/xxx/R/fasttext_test/train.bin","-lr", 1, "-epoch", 50,"-wordNgrams", 2, "-verbose", 1 ))
This worked (while the Facebook one would not) - however I'm using pre trained vectors : https://github.com/jazzyarchitects/fasttext-node/raw/master/train.txt
Here is the RAM size
memory.limit() [1] 16204
Would you know of a larger example I could try with fastrtext to try that you know works with a pretrained vec from an external source? It may help clarify if it's my environment or not
hi,i have a question: the arguments of ' pretrainedVectors does not support the vec products by gensim ?thks
@datalee what is the feature you are referring to?
@pommedeterresautee classification.
pretrainedVectors
is the text file produced by fasttext when you learn a model, whatever it is. I don't know the format of gensim but should not be hard to convert (word\tvector where each value is separated by a space).
Everything is okay within the default parameter setting. But when I raised the dimension of word vector too 200, or 300. The model training is still fast but hang at model output. Could you help to check it?