Open dgrapov opened 9 years ago
@dgrapov Yes the pain point of using RCA is that you'll have define chunks by yourself. Maybe there's a automatic way of doing it but I am not an expert in that. Maybe @road2stat has some ideas about this?
I suggest you to check lfda package here, which you can train a metric to pre-process data before training a classification model and the trained metric can also be applied to testing set. There's also another package called dml that contains a wider range of metric learning algorithms. So feel free to give it try.
Thanks @terrytangyuan your packages look very promising!
There is empirical evidence that using RCA as a feature learning method could give a small boost on performance sometimes. I have three comments on this: 1. it's generally difficult to find prior knowledge or define the "chunklets", although one could do this by grouping subsets of samples with the same class label; 2. if we do 1, then the validation procedures should be discrete, to avoid "leaking" information from the test sets to training sets; 3. Due to 1 and 2, one may just use straightforward things like latent factor-based models or autoencoders for feature learning. -Nan
Hello,
Is it possible to use rca as a pre-treatment to classification problems? For example as mentioned in the following manuscripts:
https://www.aaai.org/Papers/AAAI/2008/AAAI08-095.pdf http://papers.nips.cc/paper/2164-distance-metric-learning-with-application-to-clustering-with-side-information.pdf
When I test with the iris data (see here ). It seems that rca might be usable but defining the chunks seems to be arbitrary. Can you please advise?
Thank you.