pommedeterresautee / fastrtext

R wrapper for fastText
https://pommedeterresautee.github.io/fastrtext/
Other
101 stars 15 forks source link

Save fastrtext trained model #32

Closed 86mm86 closed 5 years ago

86mm86 commented 5 years ago

Hi,

I would like to save a trained model (more specifically a supervised model for text classification) on disk for later re-use (so it should not be a temporary file).

I am using fastrtext on Microsoft Azure Machine Learning Studio for a project and it would be ideal if trained models could be saved as ".rds" files. Is this possible, and if not what would you suggest as a workaround?

Thanks!

pommedeterresautee commented 5 years ago

Model are managed at the C++ level. You can save it through R using the command line (-output option)

Kind regards, Michaël

86mm86 commented 5 years ago

Hi Michaël,

Thanks for the prompt reply! After saving the output of the execute function "tmp_file_model" as done below

model <- load_model(tmp_file_model)

the variable "model" is now an R "Environment" object which unfortunately can not be saved and reloaded successfully with base R functions like "saveRDS()" and "readRDS()" for later re-use. To me this is important because of the many Azure experiments I have, which force me to save and read fastrtext trained models multiple times.

Any thought regarding this?

Kind regards, Marco

pommedeterresautee commented 5 years ago

Basically the object model is just a wrapped pointer to the real C++ object. It's because of the way fast text is built. So there is no easy way to make R able to manipulate it or save it directly. But what I don't understand is why saving by the command line is not ok for you. On azure you can't load something otherwise than with readRds ?

Cordialement,


Michaël Benesty 06 52 93 92 28 michael@benesty.fr


De : mm86 notifications@github.com Envoyé : mercredi, décembre 19, 2018 9:42 PM À : pommedeterresautee/fastrtext Cc : Michaël Benesty; Comment Objet : Re: [pommedeterresautee/fastrtext] Save fastrtext trained model (#32)

Hi Michaël,

Thanks for the prompt reply! After saving the output of the execute function "tmp_file_model" as done below

model <- load_model(tmp_file_model)

the variable "model" is now an R "Environment" object which unfortunately can not be saved and reloaded successfully with base R functions like "saveRDS()" and "readRDS()" for later re-use. To me this is important because of the many Azure experiments I have, which force me to save and read fastrtext trained models multiple times.

Any thought regarding this?

Kind regards, Marco

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/pommedeterresautee/fastrtext/issues/32#issuecomment-448737026, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AA-28pJFLuMw35DJJZbJjlz7cG2ax1dpks5u6qTQgaJpZM4ZZXMv.

86mm86 commented 5 years ago

Hi Michaël,

I can successfully use both the "execute()" and "load_model()" fastrtext functions for training and testing models on a single Azure ML Studio experiment, because each experiment has its own R session.

In my project I have many Azure experiments: one for model training, one for model deployment and one for web-service integration with the deployed model. As a consequence any R environment variable within the training Azure experiment (including the fastrtext model itself) would be invisible to the model deployment and web-service integration experiments, because they all have different R sessions.

The reason why I have multiple experiments is that, for my use case, a web-service call should output a prediction on real-time incoming data in about 1 or 2 seconds (clearly I can not just retrain my model in each experiment). Models trained with "native" Azure tools can be easily saved and re-loaded across different experiments, but the choice is limited. Since, thanks to fastrtext, I noticed, on my local machine, a significantly higher accuracy in predictions, I would like to be able to use it also on Azure ML Studio. In order to do so, I need to be able to save and re-load fastrtext trained models across multiple Azure experiments with different R sessions. Standard R functions like save() and saveRDS() do not work to accomplish this (saveRDS actually does not output any error, but when I read the object with readRDS() and use it for predictions I get an error).

Sorry for the long reply, I hope at least I was clear enough to explain what is the problem I am facing. Let me know if you can think of a workaround to solve it!

Kind regards, Marco

RezaSadeghiWSU commented 5 years ago

Model are managed at the C++ level. You can save it through R using the command line (-output option)

Kind regards, Michaël

Thank you for your great package. I faced with the same problem (objects with null pointers). I got the complementary explanation from following link. https://stackoverflow.com/questions/54797062/saving-a-model-which-utilizes-external-pointers-to-c-in-r

Regards, Reza

pommedeterresautee commented 5 years ago

Hi new explicit function from @olsgaard integrated in the last version make the output path explicit. Let me know if this not enough. For now I close the issue but feel free to reopen it if it not clear. https://github.com/pommedeterresautee/fastrtext/blob/master/R/API.R#L530