Closed jacobmunson closed 4 years ago
Edit: simply do a prediction on each line item in the 80/20 split
Prediction on each line of an 80/20 split is up and running - it's still brutally slow in R on the 1M dataset. Something like 1.4 hours (with 3 similarity measures computed "at once") for the 100k dataset.
Runtimes for 100k are not about 57 minutes (considerable speed increase). Running on several variants of similarity measures.
Upon recommendation, instead of finding all pairwise comparisons and carrying them around, maybe instead find all pairwise comparisons (by user), use similarity of choice, select top k (highest value in evaluation of interest), and just move those around.
New: For 100k dataset and k = 15, 610users * 15sim/user = 9150 similarities carried around. Old: For 100k dataset, 610users, 164,054 similarities to be computed.
Definitely time test as similarity measure will take time.