hosseinmoein / DataFrame

C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types and contiguous memory storage
https://hosseinmoein.github.io/DataFrame/
BSD 3-Clause "New" or "Revised" License
2.38k stars 298 forks source link

How can I convert the `Eigen Matrix` to `DataFrame` or `DataFrame` to `Eigen Matrix`? #281

Closed shaojunjie0912 closed 6 months ago

shaojunjie0912 commented 6 months ago

Hi, I am new to DataFrame.

I'd like to convert the Eigen Matrix to DataFrame or DataFrame to Eigen Matrix.

How should I do that?

Thank you very much!!!

hosseinmoein commented 6 months ago

Matrix and DataFrame are very different data structures and used for different kind of analysis. If you simply want to transferer the data from one structure to another, it is very easy. Both Eigen and DataFrame use std::vectors as their underlying data container. But I believe Eigen matrix requires special alignment which DataFrame also gives you the tools for it. Read the documentation here https://htmlpreview.github.io/?https://github.com/hosseinmoein/DataFrame/blob/master/docs/HTML/DataFrame.html.

But of you intend to substitute Eigen matrix with DataFrame and run matrix operations on DataFrame, that’s a very bad idea. In other words; a DataFrame is not a matrix and a matrix is not a DataFrame

shaojunjie0912 commented 6 months ago

Thanks for your reply. Here's my situation.

I'm currently implementing the Principal Component Analysis (PCA) algorithm in C++, and I've implemented a basic version with the Eigen library.

Now I want to use the DataFrame library to implement it, because DataFrame provides a lot of data analysis functions (thanks for your work), but in the PCA algorithm I need to calculate the eigenvalues and eigenvectors of the whole data covariance matrix, or the SVD decomposition of the whole data matrix.

However, I noticed that the DataFrame library does not have similar functions, so I want to combine the DataFrame library and the Eigen library, use the Eigen library to solve the covariance matrix, and then convert it to the DataFrame for further analysis.

I don't know if my idea of implementing PCA complicates a simple problem, or if I don't need to use the functionality of the DataFrame library at all.

Thank you!

hosseinmoein commented 6 months ago

Yes, DataFrame doesn't have Eigen values and vectors and svc, because the data in a DataFrame is not layed out efficiently for those calculations. I have another matrix library (https://github.com/hosseinmoein/Tiger) that has those and more. You may want to look at it.

In any case, moving data between a matrix and DataFrame should be very easy.

shaojunjie0912 commented 6 months ago

Thanks a lot, I will have a try.