pmorissette / ffn

ffn - a financial function library for Python
pmorissette.github.io/ffn
MIT License
1.9k stars 284 forks source link

NaN values in multi-column data set are replaced in calc_stats calculation #155

Open rs0013 opened 3 years ago

rs0013 commented 3 years ago

When having a multi-column data set such as:

               SYM1         SYM2

Date
2020-01-01 1000.000000 1000.000000 2020-02-01 1000.000000 1000.000000 2020-02-15 NaN 1005.000000 2020-03-01 1010.000000 1015.050000 2020-04-01 1010.000000 1015.050000 2020-05-01 1020.100000 1025.200500 2020-06-01 1020.100000 1025.200500 2020-07-01 1030.301000 1035.452505 2020-08-01 1030.301000 1035.452505 2020-09-01 1040.604010 1045.807030 2020-10-01 1040.604010 NaN 2020-11-01 1051.010050 NaN 2020-12-01 1051.010050 NaN 2021-01-01 1061.520151 1056.265100 2021-02-01 1061.520151 1056.265100

The calc_stats() calculates the sharpe of 2.92 for SYM1 when it should be 3.08 (running calc_stats just on a single column data set). Upon closer inspection, it appears that calc_stats() for the SYM1 column is using or combining value from SYM2 where NaN rows exists. Can you please shed some light on this? If all of my rows for both SYM1 and SYM2 are not NaN then the calculation for sharpe is correct for both columns. If I drop any of the two columns the calc_stats() becomes correct for the single column remaining.

I really appreciate your help in advance.