phenology / springtime

Spatiotemporal phenology research with interpretable models
https://springtime.readthedocs.io
Apache License 2.0
3 stars 2 forks source link

phenocam pick dataframe columns and order #146

Closed sverhoeven closed 1 year ago

sverhoeven commented 1 year ago

Refs #140

Site ```python In [1]: from springtime.datasets import PhenocamrSite ...: ...: dataset = PhenocamrSite( ...: site="harvard$", ...: years=(2019, 2020), ...: ) ...: dataset.download() ...: df = dataset.load() ...: df Out[1]: datetime geometry roi_id_number midday_r midday_g midday_b midday_gcc ... smooth_ci_gcc_75 smooth_ci_gcc_90 smooth_ci_rcc_mean smooth_ci_rcc_50 smooth_ci_rcc_75 smooth_ci_rcc_90 int_flag 0 2019-01-01 POINT (-72.17150 42.53780) 0001 NaN NaN NaN NaN ... 0.00315 0.00367 0.00726 0.00802 0.00702 0.00662 NaN 1 2019-01-02 POINT (-72.17150 42.53780) 0001 62.17641 74.71778 61.94605 0.37577 ... 0.00317 0.00370 0.00731 0.00808 0.00707 0.00667 NaN 2 2019-01-03 POINT (-72.17150 42.53780) 0001 NaN NaN NaN NaN ... 0.00315 0.00367 0.00726 0.00802 0.00702 0.00662 NaN 3 2019-01-04 POINT (-72.17150 42.53780) 0001 NaN NaN NaN NaN ... 0.00311 0.00363 0.00717 0.00792 0.00693 0.00654 NaN 4 2019-01-05 POINT (-72.17150 42.53780) 0001 31.51179 40.71661 42.90367 0.35365 ... 0.00310 0.00361 0.00714 0.00789 0.00690 0.00651 NaN ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 1457 2020-12-27 POINT (-72.17150 42.53780) 1000 63.69476 77.64401 63.56541 0.37893 ... 0.00323 0.00377 0.00744 0.00822 0.00720 0.00679 NaN 1458 2020-12-28 POINT (-72.17150 42.53780) 1000 NaN NaN NaN NaN ... 0.00321 0.00374 0.00738 0.00815 0.00714 0.00674 NaN 1459 2020-12-29 POINT (-72.17150 42.53780) 1000 NaN NaN NaN NaN ... 0.00316 0.00369 0.00728 0.00804 0.00705 0.00665 NaN 1460 2020-12-30 POINT (-72.17150 42.53780) 1000 42.90064 57.88290 53.29560 0.37567 ... 0.00314 0.00366 0.00724 0.00799 0.00700 0.00661 NaN 1461 2020-12-31 POINT (-72.17150 42.53780) 1000 NaN NaN NaN NaN ... 0.00316 0.00369 0.00728 0.00804 0.00705 0.00665 NaN [1462 rows x 47 columns] In [2]: In [2]: df.info() Int64Index: 1462 entries, 0 to 1461 Data columns (total 47 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 datetime 1462 non-null datetime64[ns] 1 geometry 1462 non-null geometry 2 roi_id_number 1462 non-null object 3 midday_r 476 non-null float64 4 midday_g 476 non-null float64 5 midday_b 476 non-null float64 6 midday_gcc 476 non-null float64 7 midday_rcc 476 non-null float64 8 r_mean 476 non-null float64 9 r_std 476 non-null float64 10 g_mean 476 non-null float64 11 g_std 476 non-null float64 12 b_mean 476 non-null float64 13 b_std 476 non-null float64 14 gcc_mean 476 non-null float64 15 gcc_std 476 non-null float64 16 gcc_50 476 non-null float64 17 gcc_75 476 non-null float64 18 gcc_90 476 non-null float64 19 rcc_mean 476 non-null float64 20 rcc_std 476 non-null float64 21 rcc_50 476 non-null float64 22 rcc_75 476 non-null float64 23 rcc_90 476 non-null float64 24 max_solar_elev 476 non-null float64 25 snow_flag 0 non-null float64 26 outlierflag_gcc_mean 1462 non-null int64 27 outlierflag_gcc_50 1462 non-null int64 28 outlierflag_gcc_75 1462 non-null int64 29 outlierflag_gcc_90 1462 non-null int64 30 smooth_gcc_mean 1462 non-null float64 31 smooth_gcc_50 1462 non-null float64 32 smooth_gcc_75 1462 non-null float64 33 smooth_gcc_90 1462 non-null float64 34 smooth_rcc_mean 1462 non-null float64 35 smooth_rcc_50 1462 non-null float64 36 smooth_rcc_75 1462 non-null float64 37 smooth_rcc_90 1462 non-null float64 38 smooth_ci_gcc_mean 1462 non-null float64 39 smooth_ci_gcc_50 1462 non-null float64 40 smooth_ci_gcc_75 1462 non-null float64 41 smooth_ci_gcc_90 1462 non-null float64 42 smooth_ci_rcc_mean 1462 non-null float64 43 smooth_ci_rcc_50 1462 non-null float64 44 smooth_ci_rcc_75 1462 non-null float64 45 smooth_ci_rcc_90 1462 non-null float64 46 int_flag 0 non-null float64 dtypes: datetime64[ns](1), float64(40), geometry(1), int64(4), object(1) memory usage: 548.2+ KB ```
bbox ```python In [3]: from springtime.datasets import PhenocamrBoundingBox ...: ...: dataset = PhenocamrBoundingBox( ...: area={ ...: "name": "harvard", ...: "bbox": [-73, 42, -72, 43], ...: }, ...: years=(2019, 2020), ...: ) ...: dataset.download() ...: df = dataset.load() ...: df Out[3]: datetime geometry roi_id_number midday_r midday_g midday_b midday_gcc ... smooth_ci_gcc_75 smooth_ci_gcc_90 smooth_ci_rcc_mean smooth_ci_rcc_50 smooth_ci_rcc_75 smooth_ci_rcc_90 int_flag 0 2019-01-01 POINT (-72.17436 42.53508) 1000 NaN NaN NaN NaN ... 0.00328 0.00328 0.00940 0.00963 0.00981 0.00956 NaN 1 2019-01-02 POINT (-72.17436 42.53508) 1000 84.86659 85.59192 85.59225 0.33428 ... 0.00318 0.00319 0.00911 0.00934 0.00952 0.00927 NaN 2 2019-01-03 POINT (-72.17436 42.53508) 1000 NaN NaN NaN NaN ... 0.00319 0.00320 0.00916 0.00938 0.00956 0.00931 NaN 3 2019-01-04 POINT (-72.17436 42.53508) 1000 NaN NaN NaN NaN ... 0.00332 0.00333 0.00952 0.00975 0.00994 0.00968 NaN 4 2019-01-05 POINT (-72.17436 42.53508) 1000 60.30106 67.32647 78.02750 0.32738 ... 0.00341 0.00342 0.00978 0.01002 0.01021 0.00994 NaN ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 17284 2020-12-27 POINT (-72.18957 42.53556) 1000 89.97984 107.61050 119.11322 0.33978 ... 0.00401 0.00364 0.01080 0.01040 0.00953 0.00944 NaN 17285 2020-12-28 POINT (-72.18957 42.53556) 1000 NaN NaN NaN NaN ... 0.00399 0.00362 0.01074 0.01048 0.00948 0.00939 NaN 17286 2020-12-29 POINT (-72.18957 42.53556) 1000 NaN NaN NaN NaN ... 0.00395 0.00359 0.01064 0.01058 0.00939 0.00930 NaN 17287 2020-12-30 POINT (-72.18957 42.53556) 1000 85.32648 94.34117 92.74101 0.34632 ... 0.00394 0.00357 0.01060 0.01069 0.00935 0.00926 NaN 17288 2020-12-31 POINT (-72.18957 42.53556) 1000 NaN NaN NaN NaN ... 0.00395 0.00359 0.01064 0.01080 0.00939 0.00930 NaN [17289 rows x 47 columns] In [4]: df.info() Int64Index: 17289 entries, 0 to 17288 Data columns (total 47 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 datetime 17289 non-null datetime64[ns] 1 geometry 17289 non-null geometry 2 roi_id_number 17289 non-null object 3 midday_r 5220 non-null float64 4 midday_g 5220 non-null float64 5 midday_b 5220 non-null float64 6 midday_gcc 5220 non-null float64 7 midday_rcc 5220 non-null float64 8 r_mean 5220 non-null float64 9 r_std 5220 non-null float64 10 g_mean 5220 non-null float64 11 g_std 5220 non-null float64 12 b_mean 5220 non-null float64 13 b_std 5220 non-null float64 14 gcc_mean 5220 non-null float64 15 gcc_std 5220 non-null float64 16 gcc_50 5220 non-null float64 17 gcc_75 5220 non-null float64 18 gcc_90 5220 non-null float64 19 rcc_mean 5220 non-null float64 20 rcc_std 5220 non-null float64 21 rcc_50 5220 non-null float64 22 rcc_75 5220 non-null float64 23 rcc_90 5220 non-null float64 24 max_solar_elev 5220 non-null float64 25 snow_flag 0 non-null float64 26 outlierflag_gcc_mean 17289 non-null float64 27 outlierflag_gcc_50 17289 non-null float64 28 outlierflag_gcc_75 17289 non-null float64 29 outlierflag_gcc_90 17289 non-null float64 30 smooth_gcc_mean 17289 non-null float64 31 smooth_gcc_50 17289 non-null float64 32 smooth_gcc_75 17289 non-null float64 33 smooth_gcc_90 17289 non-null float64 34 smooth_rcc_mean 17289 non-null float64 35 smooth_rcc_50 17289 non-null float64 36 smooth_rcc_75 17289 non-null float64 37 smooth_rcc_90 17289 non-null float64 38 smooth_ci_gcc_mean 17289 non-null float64 39 smooth_ci_gcc_50 17289 non-null float64 40 smooth_ci_gcc_75 17289 non-null float64 41 smooth_ci_gcc_90 17289 non-null float64 42 smooth_ci_rcc_mean 17289 non-null float64 43 smooth_ci_rcc_50 17289 non-null float64 44 smooth_ci_rcc_75 17289 non-null float64 45 smooth_ci_rcc_90 17289 non-null float64 46 int_flag 1440 non-null float64 dtypes: datetime64[ns](1), float64(44), geometry(1), object(1) memory usage: 6.3+ MB ```
SarahAlidoost commented 1 year ago

Should this class have an argument like variables to not return all those columns? shall I make an issue for it?

sverhoeven commented 1 year ago

Should this class have an argument like variables to not return all those columns? shall I make an issue for it?

Yep. good idea. There are quite a few. Seems columns starting with smooth_ are generated by phenocamr, we could make a toggle for that.