Synthetic data is way out of scope because it is non-trivial to create synthetic panel data. I also deliberately avoided the term "simualated" data because it implies some meaningful structure according to an economic model (at least to me). I hence decided to create "dumy" data because it is really just some random data that allows users to at least run the code, but not replicate the results.
I know it would be great if we had actually meaningful data that we pull from some other source than CRSP, but I believe that it is either hard from a legal perspective (e.g. by tapping simfin) or hard from an effort perspective (e.g. extracting the information from raw data). I also don't dare asking ChatGPT for simulated data because who knows whether it actually steals the data from somewhere.
Please let me know whether you agree with the direction and whether I should continue writing some text around the code chunks.
Synthetic data is way out of scope because it is non-trivial to create synthetic panel data. I also deliberately avoided the term "simualated" data because it implies some meaningful structure according to an economic model (at least to me). I hence decided to create "dumy" data because it is really just some random data that allows users to at least run the code, but not replicate the results.
I know it would be great if we had actually meaningful data that we pull from some other source than CRSP, but I believe that it is either hard from a legal perspective (e.g. by tapping simfin) or hard from an effort perspective (e.g. extracting the information from raw data). I also don't dare asking ChatGPT for simulated data because who knows whether it actually steals the data from somewhere.
Please let me know whether you agree with the direction and whether I should continue writing some text around the code chunks.