jameschapman19 / scikit-palm

Implementing permutation methods for multiview learning in python
BSD 3-Clause "New" or "Revised" License
4 stars 0 forks source link

skpalm.permutations.quickperms is just the same as np.random.shuffle? #5

Open JohannesWiesner opened 1 year ago

JohannesWiesner commented 1 year ago

Describe the bug

Hi James, I would like to use skpalm.permutations.quickperms to create permuted ids that respect the covariance structure of the HCP. I always assumed that for the resampled arrays we would draw with replacement but in such a way that that the covariance structures in the resampled samples match the original covariance structure. But right now, I just end up with shuffled subject is (for which I could just have used np.random.shuffle(subject_ids). My current guess is that I must have understood something wrong?

To Reproduce

import os
import sys
import numpy as np
import pandas as pd

# make sure that octave is installed first, and that you can call it in the console with 'octave'
# otherwise you can of course also use MATLAB (oct2py just makes it convienent to run 
# matlab code within a python session)
from oct2py import octave 

# skpalm is not pip-installable yet so we need to show Python were it can find it on our harddrive
sys.path.append('../repos/skpalm')
import skpalm

# load restricted hcp file (you must download this first. You can only download
# it when you were granted access to the restricted HCP data)
hcp_df = pd.read_csv('../data/hcp_restricted.csv')

# provide everything that hcp2blocks.m needs to run to create the exchangability blocks
palm_df = hcp_df.loc[:,["Subject","Mother_ID","Father_ID","ZygositySR","ZygosityGT","Age_in_Yrs"]]
palm_df.to_csv('../data/palm_df.csv',index=False)

# generate exchangability blocks and save ids because some subjects can be dropped during hcp2blocks.m
# Make sure that the current working directory contains the files: hcp2blocks.m and strcsvread.m
# More on this here: https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/PALM/ExchangeabilityBlocks
octave.hcp2blocks('../data/palm_df.csv','../data/EB.csv')
eb = np.loadtxt('../data/EB.csv',delimiter=',',dtype=int)
eb_ids = eb[:,-1] # get original ids

# after hcp2blocks ran we don't need the intermediate .csv files anymore because we got eb in RAM
os.remove('../data/palm_df.csv')
os.remove('../data/EB.csv')

# create permutations (we use n_perms + 1 because we will drop the first column)
size = 100
perms = skpalm.permutations.quickperms(design_matrix=None,exchangeability_blocks=eb,perms=size+1)
perms = perms[0] # only the first entry in the returned tuple is relevant
perms = perms[:,1:] # drop first column (this is the original order)
perms = perms - 1 # convert MATLAB indices to python indices
ids_resampled = eb[:,-1][perms] # get subject ids, not indices (upcoming dataframes can be in different order)

# check if the original ids match the resampled ids (shouldn't be?). How is
# that different from just randomly shuffling the original ids if the resampled
# ids are the same always?
unique_subjects = hcp_df.loc[hcp_df['Subject'].isin(eb_ids),'Subject'].unique()
for ids in ids_resampled.T:
    print(set(ids) == set(unique_subjects))

BTW, I think it already mentioned the hierach package, very nice python implementation for hierarchical resampling. Right now, the package is still tied to very specific tests and thus not usable for CCA but I already asked the maintainer to separate the resampling methods from the statistical tests that one wants to conduct.

jameschapman19 commented 1 year ago

Slow reply, apologies. I think the above is entirely possible and I obviously mothballed the project to focus on other priorities.

I agree with the principle though and I think in some ways it was a design problem of the MATLAB PALM package that it wasn't particularly modular so pulling bits out was tough!

jameschapman19 commented 1 year ago

In fact I am going to archive this repo to avoid confusion. My mistake fwiw was trying to replicate PALM when I should have tried to rebuild the useful functionality in a modular (pythonic) way. Hopefully hierarch are achieving that :)