recommenders / rival

RiVal recommender system evaluation toolkit
rival.recommenders.net
Apache License 2.0
150 stars 40 forks source link

Not generalized CrossValidationSplitter class #102

Closed aleSuglia closed 9 years ago

aleSuglia commented 9 years ago

When you have to manage a DataModel that has different kind of identifiers for users and items from the standard integer one, you need to take advantage of the type parameters that this class has.

For this reason, each splitter defined in Rival should be able to run for every type of DataModel that the user can use. All of them need a generalized version, but only the CrossValidationSplitter can be easily generalized as suggested by the PR https://github.com/recommenders/rival/pull/94.

Other strategies should be adopted in order to generalize all the other kind of splitters that heavily depend on the type of the identifiers (see RandomSplitter which uses the Long.parseLong() method in order to convert a string to Long).

alansaid commented 9 years ago

Thanks @aleSuglia, I think this is something definitely worth pursuing. Since you've been working on this, could you also create one/several issues for the other splitters? If we target this for 0.3 we should have generalized all splitters for that release.

aleSuglia commented 9 years ago

You're welcome.

I'll read the whole code in the splitter package in order to understand better how can we generalize it. As soon as possibile I'll create seperate issues with a possibile solution to each of them.

abellogin commented 9 years ago

Thank you, Alessandro, but it is not necessary that you also provide a solution. Just create an issue for each Splitter that should be generalized.

And by the way, the problem you mention with Long.parseLong in RandomSplitter should be easy to address using the class Pair that was recently isolated from the strategies classes.

aleSuglia commented 9 years ago

If I have the possibility to propose a solution I'll do it without any problem.

I've never used the class Pair so I'll see it in order to understand how can be used in that case.

alansaid commented 9 years ago

as @abellogin said, we can split the workload of generalizing the other splitters, but just for bookkeeping it would be good to have issues associated with every splitter that needs to be fixed.

aleSuglia commented 9 years ago

@alansaid Fine. I'll do it in that way :+1: Should I use a specific title for each issue in order to remember that belong to the same problem?

alansaid commented 9 years ago

@aleSuglia pick something reasonable

alansaid commented 9 years ago

fixed in c95a904c392334890767c91b14ab1580fab6b9bb