zenogantner / MyMediaLite

recommender system library for the CLR (.NET)
499 stars 192 forks source link

Rating prediction could stream test-file to save memory #432

Open jkleint opened 10 years ago

jkleint commented 10 years ago

I'm making predictions with a command like this:

rating_prediction \
    --recommender="$method" \
    --rating-type=byte \
    --training-file="$1" \
    --test-file="$2" \
    --test-no-ratings \

When running with a large "test-file", it uses a lot of memory; with a small test-file, it uses a little memory. I'm not familiar with the internals of MyMediaLite, but can't each prediction be made independently, facilitating a streaming read of the test-file? This would be both faster and enable the use of larger data sets.

It would also be awesome if I could read from a process or fifo (i.e., streaming read without seeking) so I could store my data compressed and uncompress it on the fly.

zenogantner commented 10 years ago

This makes sense for very large data sets. I would implement this after some general changes, which will come with 4.0.

Of course I will not keep anyone from working on this, so if there are patches, we could have it in 3.10 or 4.0.

How large are your test files?

jkleint commented 10 years ago

A few GB.