sktime / sktime-neuro

time series machine learning for neurological data
9 stars 3 forks source link

Algorithm for finding max-cliques if statistical testing is converse-transitive #14

Closed fkiraly closed 3 years ago

fkiraly commented 3 years ago

From our discussion in the stand-up - I quickly mocked up an algorithm that should be correct in finding max-cliques if the post-hoc test is converse-transitive, which should hold for Friedman post-hoc, unpaired Wilcoxon, but probably not for paired Wilcoxon (needs to be checked).

Here, converse-transitive means that: if $A=_T C$ and $A\le_R B\le_R C$, then $A=_T B$ and $B=_T C$, where $=_T$ means "post-hoc/pairwise test finds no significant difference" and $\le_R$ means "overall rank is smaller-equal".

def find_max_cliques_ordered(adj):
    """Find max cliques in adj.

    only correct if adj has the following property:
        if adj[i,j]=1, and i<j then adj[i,j-1]=1 and adj[i+1,j]=1
        (often satisfied by indicators of post-hoc tests after ranking)

    Arguments
    ---------
    adj: 2D np.ndarray, symmetric or upper right diagonal adjacency matrix

    Returns
    -------
    cliqind: 2D np.ndarray, (no.cliques x 2) array of integers
        each row i indicates a max clique
            vertices in clique are range(cliqind[i,0], cliqind[i,1]+1)

    Example
    -------
    >>> adj = np.array([[1,1,0],[1,1,1],[0,1,1]])
    >>> find_max_cliques_ordered(adj)
    """
    arrp = np.pad(adj, ((0,1), (1,0)))
    arrl = 1-np.pad(adj, (0,1))
    arrr = 1-np.pad(adj, (1,0))
    cliq = (arrp*arrr*arrl)[:-1,1:]
    cliqind = np.argwhere(cliq)

    return cliqind

Explanation: the algorithm is a vectorized version of "find all 1s that have a 0 above and to the right" (where "outside the matrix" is considered a 0)

fkiraly commented 3 years ago

The condition on adj should be equivalent to: the max-cliques are all complete graphs with vertex set ${i : a\le i \le b}$ for certain integers $a, b$ (varying by max-clique), with vertices numbered in the same order as rows/columns in the adjacency matrix.

TonyBagnall commented 3 years ago

so the way we do it uses paired wilcoxon with the Holm adjustment for multiple testing. I was planning on waiting until te statistical test module was in place before porting it over.

fkiraly commented 3 years ago

@TonyBagnall, do you know whether the paired Wilcoxon test (aka signed rank test) has the property?

I.e., if you have three samples A, B, C, with location(A)<=location(B) and location(B)<=location(C) and differences in location A vs C not significant imply that A vs B and B vs C are both not significant (at same level)?

TonyBagnall commented 3 years ago

This issue should be on sktime, tests are going there.