neurodata / hyppo

Python package for multivariate hypothesis testing
https://hyppo.neurodata.io/
Other
215 stars 88 forks source link

incorporate block permutation into K-sample tests #155

Open sampan501 opened 3 years ago

sampan501 commented 3 years ago

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context (e.g. screenshots)

sampan501 commented 3 years ago

Basically, like in this test, but for all the tests in the independence and k-sample module

ghost commented 2 years ago

@sampan501 Can I do this one?

ghost commented 2 years ago

Basically, like in this test, but for all the tests in the independence and k-sample module

@sampan501 What should be done here? The description is a little bit vague. I checked the module still not sure about the required modifications.

sampan501 commented 2 years ago

So, Dcorr supports block permutation using the perm_blocks parameter. I want that same functional, but for K-sample Dcorr. So basically add perm_blocks like auto here: https://github.com/neurodata/hyppo/blob/135cfcd6b13a986b09ec7629931bce8676ea5547/hyppo/ksample/ksamp.py#L299

ghost commented 2 years ago

@sampan501 You mean this part: https://github.com/neurodata/hyppo/blob/135cfcd6b13a986b09ec7629931bce8676ea5547/hyppo/independence/dcorr.py#L222-L258

for all the tests in the independence and k-sample module?

sampan501 commented 2 years ago

That and https://github.com/neurodata/hyppo/blob/135cfcd6b13a986b09ec7629931bce8676ea5547/hyppo/independence/dcorr.py#L253 this for k-sample tests https://github.com/neurodata/hyppo/blob/135cfcd6b13a986b09ec7629931bce8676ea5547/hyppo/independence/base.py#L138 and modify this unit test to test your changes https://github.com/neurodata/hyppo/blob/5e0fe5e4337d5997f2fb1b313cd8e94540aa8807/hyppo/tools/tests/test_common.py#L265

ghost commented 2 years ago

That and

https://github.com/neurodata/hyppo/blob/135cfcd6b13a986b09ec7629931bce8676ea5547/hyppo/independence/dcorr.py#L253

this for k-sample tests https://github.com/neurodata/hyppo/blob/135cfcd6b13a986b09ec7629931bce8676ea5547/hyppo/independence/base.py#L138

and modify this unit test to test your changes https://github.com/neurodata/hyppo/blob/5e0fe5e4337d5997f2fb1b313cd8e94540aa8807/hyppo/tools/tests/test_common.py#L265

I don't have extensive knowledge about unit testing. I need a little bit more instruction on these. or maybe a link to learn more about these? @sampan501

sampan501 commented 2 years ago

https://docs.pytest.org/en/6.2.x/ https://docs.pytest.org/en/6.2.x/parametrize.html#pytest-mark-parametrize-parametrizing-test-functions

c-adittya commented 2 years ago

Can you assign me this issue please? And where do I call the test_permtest function?

sampan501 commented 2 years ago

The test_permtest function is called by pytest. I think the above links explain how the process works. Let me know if you have any questions

Muhammedsinanck commented 1 year ago

Hi i am new to open source , is this issue still open ?

ghost commented 1 year ago

i wanna work on this issues

ghost commented 1 year ago

@sampan501 thanks sir for assigning me i am working on that

d-zg commented 1 year ago

hey, can I help implement this for some of the tests in the independence folder?

sampan501 commented 1 year ago

Yes please

d-zg commented 1 year ago

thanks!

I'm new to OS and I had some issues setting up hyppo/running the tests locally. I installed the dependencies in dev-requirements.txt and tried running pytest, but it had errors collecting a bunch of files.

Am I missing some dependencies? If possible, could you point me to some resources for getting started running Hyppo locally? I tried installing the docs dependencies and building them but I got this error:

ImportError: cannot import name 'Union' from 'types' (/usr/lib/python3.10/types.py)

d-zg commented 1 year ago

Hi, just wanted to bump to ask again about running Hyppo in editable mode.

I cloned the repo, installed dependencies in requirements.txt, dev-requirements.txt, docs/requirements.txt, ran pip install -e . , then ran pytest. I got errors finding files in benchmarks/ and an error important 'Union' from 'types' from python3.10.

I also wasn't able to build the doc files. I installed the doc requirements.txt to my virtual environment then ran make html, getting the same error importing 'Union' from 'types' in python3.10.

Is there something wrong with my workflow? Or should I use a different python version? I tried python3 as well with the same error.

Thanks again!

sampan501 commented 1 year ago

Hi, sorry for not getting back to you sooner, was a little bit busy. Usually I run pytest on just the "hyppo" folder in the repo and not the entire package

sampan501 commented 1 year ago

Really pytest is used on when running the unit tests in the package, which are all contained within the tests folders in each module

d-zg commented 1 year ago

gotcha! Thanks for getting back to me. Should I just test the independence folder after making changes? Or is there a way for me to test the entire package? Sorry, not sure about best practices for contributing.

The tests in independence passed successfully for me 🥳, so I'll get started

sampan501 commented 1 year ago

I would test the methods and modules that you are training and create a new test when you add your code. The code will also automatically build when you make a commit to a PR in CircleCI

Aditi840 commented 1 year ago

Hello everyone, I'm new to open source, if this issue is still open, can you assign it to me?

mahimairaja commented 12 months ago

Is this issue solved?

sampan501 commented 12 months ago

@mahimairaja Not yet, I have not had a PR open about it yet

danbramos commented 10 months ago

Hi. Could you please assign me this issue? I would like to contribute to this and I'm sure I can help you out.

danbramos commented 10 months ago

Have you tried this? I've been going through the links you provided and I just analysing it.

  1. Import the necessary libraries: numpy and numba.
  2. Check the dimensions of the permutation blocks using the check_perm_blocks_dim function.
  3. Implement the chi2_approx function, which computes an approximate chi-square statistic.
  4. Compute the distance between the samples within each data matrix using the compute_dist function.
  5. Create a class called Dcorr that inherits from IndependenceTest.
  6. Implement the __init__ method of the Dcorr class, where you can specify the compute_distance parameter.
  7. Implement the statistic method of the Dcorr class, which calculates the Dcorr test statistic.
  8. Implement the p_value method of the Dcorr class, which computes the p-value for the test statistic.
  9. Lastly, try to create an IndependenceTestOutput class to store the results of the independence test.
sampan501 commented 10 months ago

Basically, perm_blocks has been implemented for one test (Dcorr) and we want it implemented the same way for multiple tests. I believe they things you commented are already in the package

danbramos commented 10 months ago

If you have already performed those steps, then to implement the block permutation for multiple tests, you can follow a similar approach as you did for the Dcorr test. For each test, you can follow the same instructions you used for the Dcorr test as I mentioned above. Just make sure to adapt the code to the specific requirements of each test. By repeating the steps for each test, you'll be able to implement the block permutation effectively to multiple tests

danbramos commented 10 months ago

To modify the code accordingly you might want to change the distance metric used in the compute_distance function or adding additional functions specific to each test.

  1. Update the class name and any relevant method names to reflect the specific test being implemented.
  2. Test the modified code thoroughly for each test to ensure it produces accurate results.

You might consider modifying the distance metric in the compute_distance function:


    if distance_metric == "euclidean":
        # Calculate Euclidean distance between data_point1 and data_point2
        distance = math.sqrt(sum((x - y) ** 2 for x, y in zip(data_point1, data_point2)))
    elif distance_metric == "manhattan":
        # Calculate Manhattan distance between data_point1 and data_point2
        distance = sum(abs(x - y) for x, y in zip(data_point1, data_point2))
    else:
        raise ValueError("Invalid distance metric")

    return distance
sampan501 commented 10 months ago

The stuff you are linking seem completely unrelated to this issue. Please take a look through the code I provided above in the issue and see if that makes sense to you. If it doesn't, lmk

danbramos commented 10 months ago

The link I've accessed is the first one you provided which contains all the packages you imported and Dcorr

Basically, perm_blocks has been implemented for one test (Dcorr) and we want it implemented the same way for multiple tests. I believe they things you commented are already in the package

So from your last comment, I understood that you wanted to implement perm_blocks to multiple texts, right?
Am I missing out on anything?

sampan501 commented 10 months ago

The link I provided is the implementation of Dcorr within hyppo. I just want a similar implementation to that for all other independence tests within hyppo. Does that make sense?

danbramos commented 10 months ago

Could you provide me with more details about the specific independence tests you're interested in? There are several other independence tests within hyppo that could benefit from a similar implementation. Some of these tests include the CCA test, HHG test, and MGC test, among others. By following a similar approach as the Dcorr implementation, we can ensure consistency across these different tests.

I am just want to make sure that it's exactly what you mean.

sampan501 commented 10 months ago

Let's start with one, maybe CCA, and then add more to the PR when that gets done

danbramos commented 10 months ago

Great. I feel like it's better to break it down and then we can move on to other tests. Since you want to focus on CCA for now, what has your approach been like?

Have you tried anything like this?


from hyppo.independence import CCA
# Generate your data
# X and Y should be numpy arrays or pandas DataFrames
# Create an instance of the CCA test
cca_test = CCA()
# Perform the CCA independence test
test_statistic, p_value = cca_test.test(X, Y)
# Print the results
print("Test Statistic:", test_statistic)
print("P-value:", p_value)

I'm just importing the `CCA` class from `hyppo.independence` and creating an instance of it. Then, you can perform the CCA independence test by calling the `test` method on the test instance, passing in your data `X` and `Y`. The test will return the test statistic and p-value, which you can then use or print as needed. 

Please lmk if any changes are required and what method you've tried so far