Open ghost opened 5 years ago
Add @WessZumino for advice.
@atimesastudios let me see if I got your question right: you create the user/affinity (UA) matrix (for train and test sets) locally on your machine and then upload these on your remote DSVM for training (or hyper parameter tuning). After training, you would like to get back the pandas df format from the UA-matrix in order to use the ranking metrics.
Yes, in the current version of sparse.py the load feature of the .npy
files is missing, thanks for catching this. I will add the feature so that you can use your workflow.
In the meanwhile, you could generate the UA-matrix directly on the DSVM, both the pandas df and the UA-matrix should be comparable in size if I am not mistaken, so there is no advantage in uploading the latter instead of the former.
Also, note that top_k_1m
is the UA-matrix of only the top k items per user. If you want the full UA-matrix you should use the predict()
method of the rbm class instead of the recommend_k_items()
one.
Thanks @WessZumino, I appreciate the addition of the load feature in sparse.py. In the meantime I will follow your advice.
@WessZumino, predict takes two arguments, but the second one doesn't seem to get used. The documentation string only contains information about x.
def predict(self, x, maps):
Description
I am trying to implement AzureML Hyperdrive based hyperparameter tuning of the RBM algorithm using example notebooks. I have a working RBM notebook with my dataset and I am using svd_training.py as an template for building my rbm_training,py file. As part of the RBM process an affinity matrix is created and the training and test set is built from the stratified sampler. I looked at the code and there is an optional parameter save_path that stores 4 numpy output files: item_back_dict.npy, item_dict.npy, user_back_dict.npy and user_dict.npy after invoking as follows
am1m = AffinityMatrix(DF = data, **header, save_path = DATA_DIR)
I am uploading the train and validate pkl data files to the default datastore from my local machine
During evaluation the following code requires the affinity matrix
top_k_df_1m = am1m.map_back_sparse(top_k_1m, kind = 'prediction') test_df_1m = am1m.map_back_sparse(Xtst_1m, kind = 'ratings')
How do I regenerate the affinity matrix object in the script that will be run remotely (rbm_training.py)? I was hoping to be able to use the four numpy files to enable map_back_sparse? I hope I don't have to upload the entire dataset and then regenerate an AffinityMatrix object remotely.
The AffinityMatrix code in sparse.py mentions that the numpy files can be use with a trained model but not sure how to load these 4 files to regenerate an AffinityMatrix object as the remote script executes.
Other Comments