Closed averellschmidt closed 2 years ago
hi @averellschmidt, when did you download the package? I tried to incorporate some changes to help address this back in January, but its likely there are still some bottlenecks...
Could you give a sense of how large your data set is? How many matched sets do you expect to have and how large do you anticipate each set to be? Also, could you post the error message you're getting?
R in general isn't fantastic for handling massive data sets, but this is definitely something we want to fix in future versions. You might be able to sidestep some of these issues by getting your matched sets, then removing all of the units that aren't included in those sets and then re-running PanelMatch on that smaller data set. It's a bit hacky, but I can lay out more precise steps to try that out if there's not an obvious quick fix.
Hey @adamrauh, thanks for your quick reply. I updated my version of PanelMatch within the past few weeks; more recently than January. My dataset is a 342.5 Mb csv file. It has 1,612,302 observations of 36,218 units over 74 periods. It includes 58 variables, but I am only using ~15 of those in my analysis for the time being. I was setting size.match = 10, but have reduced it to 5. I have also begun doing some exact matching, which seems to help some, but I still cannot run my analysis on the full dataset.
The error messages I'm getting are either "Error: vector memory exhausted (limit reached?)"or "Error: cannot allocate vector of size 1.2 Mb" which seems quite small. I have also had RStudio suddenly terminate after showing me the dreaded bomb with a burning fuse...
I'd appreciate more precise steps for your hack if you have them available.
Thanks again for your help, Avery
I think this is a Windows machine issue. See if you can find out some advice on the Internet.
I'm running into issues with the limited memory of my computer when running PanelMatch. Do you have any suggestions for running PanelMatch on large datasets? Thanks!