gslab-econ / gslab_python

Python tools for GSLab
MIT License
13 stars 11 forks source link

SaveData improve .log readability #182

Open zhizhongpu opened 2 months ago

zhizhongpu commented 2 months ago

Currently there are a few readability issues with SaveData() compoared to R::SaveData():

  1. Does not reveal data dimensions
    • In contrast, R::SaveData() shows the number of columns through column indexing.
    • Out goal here is to add a dim row in the log file preamble
  2. Does not round to 3 digits after ., e.g.:
                 type  count unique           mean           std       min       25%       50%       75%        max
    year            int64   2564            2000.99259      14.32621    1974.0    1990.0    2002.0    2014.0     2022.0
    var2           int64   2564         115445.092824  281232.47416    1218.0   26348.0   50742.5  78948.25  3474706.0 
  3. Does not have separators for large numbers. example above
  4. displays decimal places for integer variables, example above.
  5. Under certain circumstances, summary statistics are in scientific notations:
                         type  count          mean           std           min           25%           50%           75%           max
    year                    int64     48  1.997500e+03  1.400000e+01  1.974000e+03  1.985750e+03  1.997500e+03  2.009250e+03  2.021000e+03
    var3           int64     48  3.758074e+09  6.650852e+09  0.000000e+00  1.222425e+08  5.195290e+08  3.636557e+09  2.956582e+10
zhizhongpu commented 2 months ago

Personally I'm very inclined to remove the 25% and 75% statistics (to align Python::SaveData with R::SaveData) - I don't use them much and they take up space on GitHub diff.

If @veli-m-andirin @liaochris @ShiqiYang2022 disagree please raise your concerns - thanks.

liaochris commented 2 days ago

Updated from master in 1524a57c2c48019ddc2af4b619045adc84a01725