recommenders-team / recommenders

Best Practices on Recommendation Systems
https://recommenders-team.github.io/recommenders/intro.html
MIT License
18.89k stars 3.07k forks source link

[ASK] xDeepFM - Help on saving model checkpoint to Azure ML output directory #970

Closed ghost closed 4 years ago

ghost commented 4 years ago

Description

I've taken the xDeepFM deep dive notebook and adapted it so that it can run in Azure Machine Learning Service. I would like Azure to capture the model checkpoints and associated files so that I can download the best run and visualize training in Tensorboard, as well as restore the model at a later point in time. Currently, I do not see any of the model files captured (AML Service needs these to be in the outputs directory).

image

It appears that MODEL_DIR is used in a concatenation above. Should MODEL_DIR be passed in the form of a string such as './outputs' or as an os.path.join type construct?

image

When I tried the above on my local machine, I get the summaries nicely placed in the summaries directory under the outputs directory as I would expect. However the model files are placed in the outputs directory and are prepended with "model"

image

When I run this in Azure ML Service, summary files and model files are not available for download. My hunch is that the relative directory must be off.

Any tips on how to set up MODEL_DIR correctly in order to get the files placed in the outputs directory and how to set this up for running in Azure ML Service would be welcome.

Other Comments

miguelgfierro commented 4 years ago

maybe @eedeleon can help here?

elogicaadith commented 4 years ago

@miguelgfierro: I have a solution to this problem. It involves adding a "/" to the save_path string as shown below:

image

This code is located in base_model.py. I have validated that this works on the Azure Machine Learning service.

image

I'm in the process of validating that the change works on local notebooks as well. Will you accept a pull request for this?

elogicaadith commented 4 years ago

@miguelgfierro: Just finished testing this change on the local xdeeepfm notebook in 00_quick_start:

image

contents of the local folder in my C: drive -

image

miguelgfierro commented 4 years ago

Thanks @elogicaadith I added a comment, can you please take a look?