jstriaukas / midasml

midasml package is dedicated to run predictive high-dimensional mixed data sampling models
38 stars 22 forks source link

Question of MIDAS with panel data #8

Closed Yuanyuan77-wang closed 2 years ago

Yuanyuan77-wang commented 2 years ago

Hello, I am trying to using the MIDAS model to analyze the panel data. But now I have some problems about it and I am wondering if you could give me some advice. Does your midasml package has a function for panel data? If so, Can you share me some relevant codes or examples ?

jstriaukas commented 2 years ago

Hello,

Sure. Please write me an email with specific questions you have.

FYI, functions that handle panel data are cv.panel.sglfit and ic.panel.sglfit. Please check the description file also.

Yuanyuan77-wang commented 2 years ago

Hello, I have emailed you .Looking forward to your reply .Thanks!

jstriaukas commented 2 years ago

please email me at either striaukas@gmail.com or jonas.striaukas@gmail.com - i haven't received your email.

Yuanyuan77-wang commented 2 years ago

Hello,I emailed you again,did you received it ?

jstriaukas commented 2 years ago

can you paste here your question as i still haven't received it.

Yuanyuan77-wang commented 2 years ago

I'm very sorry that the mail failed to send again. I have read your paper: Machine Learning Panel Data Regressions with an Application to Nowcasting Price Earnings Ratios , and I want to ask some questions about the data used in the paper and the implementation in R. I am trying to reproduce your work, but in the process of collecting data, a lot of data which measure the company's financials are missing. Do you have any suggestions for filling the missing values besides removing them in the panel data? And, about ‘cv.panel.sglfit’ and ‘ic.panel.sglfit’,Do these functions must use lasso penalty? In your another paper ,you use it for high dimensional series data, but for low-dimensional panels would variable selection with some significance test yield better results? About the function ‘mixed_freq_data’ (Creates a MIDAS data structure for asingle high-frequency covariate and a single low-frequencydependent variable) ,does it return a data frame? I am confused about how to combine all the covariates together to form a complete dataset. If there are multiple variables, can they be aggregated together? Could you provide me with more examples? Thanks.

jstriaukas commented 2 years ago

Q1. "a lot of data which measure the company's financials are missing" - missing in our analysis? We list all the series in the appendix. As for missing data, for earnings and earnings forecast, we include firms in the analysis which have only full-time series without missing entries. ML with missing data is an interesting research area but we did not cover this in that paper. Please check the appendix where we detail how we deal with textual data.

Q2. "Do these functions must use lasso penalty?" - sg-LASSO is a convex combination of LASSO and group LASSO. 'gamma'\in [0,1] parameter determines the relative weight between the two norms. Setting gamma = 1.0 lead to LASSO.

Q3. "In your another paper ,you use it for high dimensional series data, but for low-dimensional panels would variable selection with some significance test yield better results?" - I am not really following the question. But, if understand the question well, our Granger causality procedure leads to more accurate inference when one uses pooled data. We report this in our panel paper MC simulations.

Q4. "About the function ‘mixed_freq_data’ (Creates a MIDAS data structure for asingle high-frequency covariate and a single low-frequencydependent variable) ,does it return a data frame?" - 'mixed_freq_data' constructs MIDAS data structures. Please see example code for the output type as well as documentation. 'mixed_freq_data_single' function might be more useful for manipulating high-dimensional panels.

Hope this helps.

Yuanyuan77-wang commented 2 years ago

Thanks for your time ! I have benefited a lot and will continue to it!