pmorissette / ffn

ffn - a financial function library for Python
pmorissette.github.io/ffn
MIT License
2k stars 297 forks source link

Code improvement _erc_weights_slsqp > fitness function #200

Closed quant12345 closed 3 months ago

quant12345 commented 1 year ago

Update: 26.01.2024

the _erc_weights_slsqp (core.py) function has a fitness function inside. I would like to improve the piece of code (fitness):

sse = 0.0
for i in range(n):
  for j in range(n):
      # switched from squared deviations to absolute deviations to avoid numerical instability
      sse += np.abs(trc[i] - trc[j])

Results of the new and old algorithms:

array size n x n 10 time_old 0.00036 time_new 0.000176 difference in time 2.05 comparison result array True value old 255.93815578030916, value new 255.93815578030916
array size n x n 30 time_old 0.003223 time_new 0.000167 difference in time 19.3 comparison result array True value old 4276.353089714127, value new 4276.353089714127
array size n x n 50 time_old 0.007529 time_new 0.000223 difference in time 33.76 comparison result array True value old 14904.788635822428, value new 14904.788635822428
array size n x n 100 time_old 0.061517 time_new 0.000888 difference in time 69.28 comparison result array True value old 89653.3603241806, value new 89653.3603241806
array size n x n 200 time_old 0.131131 time_new 0.001226 difference in time 106.96 comparison result array True value old 488742.73384341365, value new 488742.73384341365
array size n x n 500 time_old 0.516586 time_new 0.007304 difference in time 70.73 comparison result array True value old 4475768.791279719, value new 4475768.791279719
array size n x n 1000 time_old 1.486356 time_new 0.01767 difference in time 84.12 comparison result array True value old 24854285.368962307, value new 24854285.368962307
array size n x n 3000 time_old 14.744084 time_new 0.285297 difference in time 51.68 comparison result array False value old 401282111.94892, value new 401282111.94892
array size n x n 7000 time_old 69.508812 time_new 3.476536 difference in time 19.99 comparison result array False value old 3293751144.8250494, value new 3293751144.8250494
array size n x n 10000 time_old 143.482593 time_new 10.734743 difference in time 13.37 comparison result array False value old 8143248453.59402, value new 8143248453.59402

Here I print the size of the covar array, the time spent on calculations and compare the values ​​for equality and display the values ​​themselves, rounded to the fifth point.

The numbers of the old and new algorithms are slightly different, but if they are rounded to the fifth point, they will be the same. Below is the code for comparison(test code).

**test code:** ``` import numpy as np import datetime def fitness_old(weights, covar): # total risk contributions # trc = weights*np.matmul(covar,weights)/np.sqrt(np.matmul(weights.T,np.matmul(covar,weights))) # instead of using the true definition for trc we will use the optimization on page 5 trc = weights * np.matmul(covar, weights) n = len(trc) # sum of squared differences of total risk contributions sse = 0.0 for i in range(n): for j in range(n): # switched from squared deviations to absolute deviations to avoid numerical instability sse += np.abs(trc[i] - trc[j]) # minimizes metric return sse def fitness_new(weights, covar): # total risk contributions # trc = weights*np.matmul(covar,weights)/np.sqrt(np.matmul(weights.T,np.matmul(covar,weights))) # instead of using the true definition for trc we will use the optimization on page 5 trc = weights * np.matmul(covar, weights) # sum of squared differences of total risk contributions sse = np.sum(np.abs(trc - trc.reshape((-1, 1)))) # minimizes metric return sse for n in [10, 30, 50, 100, 200, 500, 1000, 3000, 7000, 10000]: weights = np.full(shape=n, fill_value=0.5) covar = np.random.uniform(low=-5, high=5, size=(n, n)) now = datetime.datetime.now() result_old = fitness_old(weights, covar) time_old = datetime.datetime.now() - now now = datetime.datetime.now() result_new = fitness_new(weights, covar) time_new = datetime.datetime.now() - now word = ('array size n x n {0} time_old {1} time_new {2}' ' difference in time {3} comparison result array {4}' ' value old {5}, value new {6}' .format(n, time_old.total_seconds(), time_new.total_seconds(), round(time_old / time_new, 2), np.all(round(result_new, 5) == round(result_old, 5)), result_old, result_old)) print(word) ```