jundongl / scikit-feature

open-source feature selection repository in python
GNU General Public License v2.0
1.51k stars 447 forks source link

Speed up recomendation for `cfs` function #46

Open caph1993 opened 5 years ago

caph1993 commented 5 years ago

In the code for cfs(X, y), you are calling repeatedly the function merit_calculation(X, y), which it self calls repeatedly the function su_calculation, sometimes with exactly the same feature(s) as in previous rounds.

To avoid repeatedly computing su_calculation(fi, y) with the same feature fi, it would be ideal to save the computation results into a list or a dictionary when they are called the first time, and to load those values instead of recomputing them when they are called afterwards. That would ensure the linear complexity of the algorithm and improve its speed.

This could be the code to achieve that:

def merit_calculation(X, y, F, memo):
    rff = 0
    rcf = 0
    for i in F:
        if i not in memo:
            fi = X[:, i]
            memo[i] = su_calculation(fi, y)
        rcf += memo[i]
        for j in F:
            if j > i:
                if (i,j) not in memo:
                    fj = X[:, j]
                    memo[(i,j)] = su_calculation(fi, fj)
                rff += memo[(i,j)]
    rff *= 2
    merits = rcf / np.sqrt(len(F) + rff)
    return merits

And the usage, supplying the indices and the memory dictionary on each call.

...
memo = {}
...
t = merit_calculation(X, y, F, memo)
F = ...
t = merit_calculation(X, y, F, memo)
F = ...
...