using pca in pipeline - Githubissues

AdoHaha commented 7 years ago

Hi, can pca be used in pipeline as a feature extractor? It is theoretically in Feature Extraction examples but when I try to add it to the pipeline via pipeline.addFeatureExtractionModule(pca); I get no known conversion for argument 1 from ‘GRT::PrincipalComponentAnalysis’ to ‘const GRT::FeatureExtraction&’ error.

Also, can pca be trained on TimeSeriesClassificationData data, or only on matrix type?

nickgillian commented 7 years ago

Hey,

At this time in the master branch, the PrincipalComponentAnalysis can't be used directly as a feature extraction module. To use it for feature extraction, you need to run the feature extraction outside of the pipeline, and then input the output of the PrincipalComponentAnalysis module as input to the pipeline.

To help improve this, I've added a new PCA module to the toolkit which allows the PrincipalComponentAnalysis algorithm to be used directly within a pipeline as a feature extraction module.

You can find this new PCA module in the dev branch: https://github.com/nickgillian/grt/tree/dev/GRT/FeatureExtractionModules/PCA

You can find an example of how to use this here: https://github.com/nickgillian/grt/blob/dev/examples/FeatureExtractionModules/PCAPipelineExample/PCAPipelineExample.cpp

I still need to test this fully, so there may be bugs/issues (which is why it is still in the dev branch and not merged with master).

One note, there is currently a hack with how you need to use PCA module. This is because you need to train the PCA module before you can use it, so this requires you to add the PCA module to the pipeline, then access a pointer to the PCA module from the pipeline, then train the PCA model with your dataset. You can see this hack in the example above. This is bad for two reasons:

It means you can't have any module before the PCA module in the pipeline (because the data will not be pumped through this module for training the PCA module)
The coding flow is rather ugly (as you need to add the module, then get a pointer to it, then manually train it).

I'm working on improving this to enable you to add multiple modules to the pipeline before PCA, add a classifier after PCA, and then when you call pipeline.train(data) the pipeline will automatically iterate through all the modules, pipe the data recursively through each stage, train the feature modules (like PCA) and then finally train the classifier at the end of the pipeline. For now, you will need to do this manually.

AdoHaha commented 7 years ago

Nice, thanks! Makes sense in training it before.

I will try it soon.

On Sun, Mar 26, 2017 at 8:39 PM, Nicholas Gillian notifications@github.com wrote:

Hey,

At this time in the master branch, the PrincipalComponentAnalysis can't be used directly as a feature extraction module. To use it for feature extraction, you need to run the feature extraction outside of the pipeline, and then input the output of the PrincipalComponentAnalysis module as input to the pipeline.

To help improve this, I've added a new PCA module to the toolkit which allows the PrincipalComponentAnalysis algorithm to be used directly within a pipeline as a feature extraction module.

You can find this new PCA module in the dev branch: https://github.com/ nickgillian/grt/tree/dev/GRT/FeatureExtractionModules/PCA

You can find an example of how to use this here: https://github.com/ nickgillian/grt/blob/dev/examples/FeatureExtractionModules/ PCAPipelineExample/PCAPipelineExample.cpp

I still need to test this fully, so there may be bugs/issues (which is why it is still in the dev branch and not merged with master).

One note, there is currently a hack with how you need to use PCA module. This is because you need to train the PCA module before you can use it, so this requires you to add the PCA module to the pipeline, then access a pointer to the PCA module from the pipeline, then train the PCA model with your dataset. You can see this hack in the example above. This is bad for two reasons:

It means you can't have any module before the PCA module in the pipeline (because the data will not be pumped through this module for training the PCA module)

The coding flow is rather ugly (as you need to add the module, then get a pointer to it, then manually train it).

I'm working on improving this to enable you to add multiple modules to the pipeline before PCA, add a classifier after PCA, and then when you call pipeline.train(data) the pipeline will automatically iterate through all the modules, pipe the data recursively through each stage, train the feature modules (like PCA) and then finally train the classifier at the end of the pipeline. For now, you will need to do this manually.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nickgillian/grt/issues/116#issuecomment-289305114, or mute the thread https://github.com/notifications/unsubscribe-auth/ACI5PZUD42jtMtS9W49PNWhU909Bma9oks5rprD6gaJpZM4MYdzq .

nickgillian / grt

using pca in pipeline #116