Closed HeuristicLab-Trac-Bot closed 11 years ago
Please also add a view that shows the result of the Jarque-Bera test (normality test, implemented in alglib) for each variable.
r7969: added
HoeffdingsDependenceCalculator
to calculate the non-parametric Hoeffding's dependency. Ideally it should be possible to show either Pearson's R², Spearman's rank correlation, or Hoeffding's dependency in the heat-map.
r8034: create branch to show correlation of dataset features[[BR]] r8035: branch project for implementing HeatMap to show correlation of dataset features[[BR]] r8036: branch another project for implementing HeatMap to show correlation of dataset features
- completed branch creation
- first simple implementation of a HeatMap, which shows the correlation of the dataset features
Please just use the alglib function for calculating the spearman's rank correlation Rename method 'Spear'
SpearmansRankCorrelationCoefficientCalculator
now uses the alglib function- strings in
ExtendedHeatMap
have been made constant
- added cloning method and constructor to
ExtendedHeatMap
- renamed a variable in
ExtendedHeatMapView
- added backwards compatibility code in
DataAnalysisProblemData
- Don't calculate the absolute value in Spearman's rank correlation.
- Please add a property
R
orCorrelation
that simply returns the correlation coefficient in the Pearson's correlation calculator.
- fixed bugs in
HoeffdingsDependenceCalculator
- added test cases for
HoeffdingsDependenceCalculator
- Renamed
ExtendedHeatMap
toFeatureCorrelation
- deleted old
CorrelationHeatMapView
- added
FeatureCorrelationView
r8525: Added bin directory and resharper files to list of SVN excluded files.
r8526: Corrected build configurations in DatasetCorrelation branch.
- BackgroundWorker is now reused in
FeatureCorrelation
- renamed some variables
- ComboBoxes are now DropDownLists
FeatureCorrelation
doesn't calculate the elements in the constructor anymore- small changes in the views
r8538: Merged trunk changes in preparation of the branch reintegration.
r8542: Integrated correlation analysis of datasets in the trunk.
The following things must be implemented:
- Views of the same object is not synchronized
- The default constructor doesn't assign a problem data to the feature correlation which could lead to exceptions
- Use start and end values to calculate to correlation instead of strings declaring which partition should be used.
- Remove the obsolete branch when all changes are implement.
r8543: Removed the feature correlation from the data analysis problem data as the implemenation is not yet finished and otherwise it could lead to persistence breaks.
r8559: removed the default constructor for
FeatureCorrelation
as it simply runs into a NullReferenceException (the default ctor is not used anywhere and is senseless).This fixes the unit test fail for the meta-optimization branch on the builder.
- added ProblemDataView which has a button to open the feature correlation
- added abstract base class for feature correlations
- added caches for the feature correlation
- created own class for calculation of feature correlation
- changed
SelectedItemChanged
toSelectionChangeCommitted
events, so the correlation is only calculated if the user changes the selection
r8579 (not migrated): deleted obsolete branch
If possible, I suggest to limit the correlation analysis to only the allowed input variables plus the target variable. That way you can apply some filtering and it could help you iteratively refining your input variables.
The correlation analysis throws an exception if too few values were added to the used calculator.
Replying to [comment:33 abeham]:
If possible, I suggest to limit the correlation analysis to only the allowed input variables plus the target variable. That way you can apply some filtering and it could help you iteratively refining your input variables. This is a good point and should be implemented
- NaN values are used, if the calculation is invalid (e.g. missing values, infinity etc.)
- Variables can now be filtered. Initially allowed input variables and target variable are shown, but with a right click a dialog can be opened to select variables, which shall be shown
I have a few remarks:
- I would restrict Pearsons R2 to only use green-yellow-red colors. It's a bit confusing that in Pearsons R green means no correlation, but in R2 it means medium correlation while red still retains its meaning.
- Hoeffdings Dependence doesn't have 1s in the diagonal (why?)
- Numbers are not easily readable if they're on dark-blue background
Replying to [comment:37 abeham]:
I have a few remarks:
- Hoeffdings Dependence doesn't have 1s in the diagonal (why?)
This is correct behaviour when the variable contains duplicate values.
r8729: Moved FeatureCorrelation specific classes from Problems.DataAnalysis to Problems.DataAnalysis.Views.
Currently the the
TimeFrameCorrelationView
is displayed by default instead of the "normal"CorrelationView
. Furthermore we should discuss the source code in detail.
Issue migrated from trac ticket # 1292
milestone: HeuristicLab 3.3.8 | component: Problems.DataAnalysis.Views | priority: medium | resolution: done
2010-11-22 15:57:45: @mkommend created the issue