manodeep / Corrfunc

⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
https://corrfunc.readthedocs.io
MIT License
163 stars 50 forks source link

Add find_fastest_DDtheta_mocks_bin_refs() function to Corrfunc/mocks/DDtheta_mocks #216

Closed kakirastern closed 4 years ago

kakirastern commented 4 years ago

Addresses #97 partially. As the title would suggest, to add the find_fastest_wp_bin_refs() function from Corrfunc/theory/wp.py#L17 to Corrfunc/mocks/DDtheta_mocks.py.

kakirastern commented 4 years ago

Encountered an error while testing, and am not sure why. When running the code as follows, L534 of Corrfunc/mocks/DDtheta_mocks.py with the snippet

isa=integer_isa

in the argument, locally I get the error

TypeError: cannot unpack non-iterable NoneType object

Here is what I have done:

from Corrfunc.mocks.DDtheta_mocks import find_fastest_wp_bin_refs
from Corrfunc.io import read_catalog

X, Y, Z = read_catalog() 

boxsize = 420.0
nthreads = 2
pimax = 40.0
binfile = np.logspace(np.log10(0.1), np.log10(10.0), 15)

find_fastest_wp_bin_refs(boxsize, pimax, mthreads, binfile, X, Y, Z)

Any suggestions as to why and how I should debug? I have checked by default isa has been set, and is isa=r'fastest'.

kakirastern commented 4 years ago

Looks like this PR passes the Travis CI tests. I could continue copying and pasting to relevant modules if everything is okay.

manodeep commented 4 years ago

Hi @kakirastern - thanks for taking a stab. From what I can tell, you have simply copy-pasted the code from the wp script into the DDtheta_mocks script. The idea here is to create a similar function (but replace wp with DDtheta_mocks in the function name) but adapt the interface to match that of DDtheta_mocks here. For instance, there should be only RA/DEC in this new function and no mention of X/Y/Z

(I am on parental leave for the next week and responses will continue to be slow. Thanks for your patience)

kakirastern commented 4 years ago

Hi @manodeep Thanks for the reply! Sorry I took the hint in the original issue to "copy and paste" a bit too literally. I will give it another try soon before the end of the coming weekend.

pep8speaks commented 4 years ago

Hello @kakirastern! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 328:80: E501 line too long (80 > 79 characters) Line 579:80: E501 line too long (84 > 79 characters) Line 584:80: E501 line too long (86 > 79 characters) Line 585:80: E501 line too long (86 > 79 characters) Line 603:80: E501 line too long (82 > 79 characters)

Line 119:80: E501 line too long (85 > 79 characters)

Comment last updated at 2020-04-13 01:43:27 UTC
kakirastern commented 4 years ago

Hi @manodeep I have made the changes as suggested. It seemed to have worked, at least on the surface. I have been able to compile and install then test the code successfully. Though I am not sure it is 100% correct. Any feedback would be greatly appreciated.

kakirastern commented 4 years ago

Fixed code to pass Codacy/PR Quality Review.

manodeep commented 4 years ago

Thanks - good work! :)

I have a couple of suggestions - ideally, the find_fastest* should be accept all of the parameters to the underlying function (DDtheta_mocks). For instance, have a think about how to pass the RA2, DEC2 parameters; and how to test for link_in_ra=False (i.e., only linking in declination), or link_in_dec=False (i.e., a brute-force calculation).

Also, you should add your name to the "author" list in the file.

kakirastern commented 4 years ago

Interesting, am seeing the An error occurred while generating the build script. error output for the clang CI tests... But the gcc CI tests are fine and are passing.

kakirastern commented 4 years ago

Just added my name to the authors' list of the Corrfunc/mocks/DDtheta_mocks file as well, plus re-triggering the Travis CI checks.

manodeep commented 4 years ago

@kakirastern Could you please re-trigger that sub-build that failed? Must have been a transient issue

kakirastern commented 4 years ago

Hi @manodeep Sure, have just re-triggered the build.

kakirastern commented 4 years ago

I have checked the Travis CI checks are indeed all passing, but for some reason that fact has not been reflected on this page as of now...

kakirastern commented 4 years ago

Just tested more thoroughly again and fixed the bug in the now L555 to print out some info statement for Case 2: (link_in_dec=True, link_in_ra=False). Previously was combining the two conditional if statements with the & logical operator (should have been the and logical operator in Python, but & worked nonetheless). But after separating the logic into layers like the following:

    if link_in_dec is True:
        if link_in_ra is False:
            msg3 = "Info: Gridding in the declination only, as link_in_dec " \
                   "is set to True while link_in_ra is set to False." \
                   "Thus looping is only needed over the range of " \
                   "(min, max) bin, with refinements in the declination."
            print(msg3)

The condition was able to be satisfied when met and the print statement was successfully outputted. For consistency's sake have changed & into and as the Python logical operator.

manodeep commented 4 years ago

Thanks @kakirastern.

What I meant for the link_in_dec=True case was that (rather than returning an error) we should be able to provide an implementation that loops and returns the fastest dec_bin_ref (possibly None for ra_bin_ref)

kakirastern commented 4 years ago

What I meant for the link_in_dec=True case was that (rather than returning an error) we should be able to provide an implementation that loops and returns the fastest dec_bin_ref (possibly None for ra_bin_ref)

Hi @manodeep Understood. Currently for the link_in_dec=True case both some additional info and an implementation that loops through and outputs the fastest dec_bin_ref are provided. (But currently I have set link_in_ra=False as the second checking condition in addition to link_in_dec=True.) I can certainly make the appropriate changes using None for ra_bin_ref soon to render the code self-consistent.

kakirastern commented 4 years ago

I have tried for the (link_in_dec=True, link_in_ra=False) case to setra_bin_ref=0, I did get an output but also the warning "Warning: bin refine factor along axis = 0 *must* be >=1. Instead found bin refine factor =0". I have also tried setting ra_bin_ref=1 for this case which makes a bit more sense intuitively, and that worked fine it seemed to my untrained eyes.

I might have misunderstood something, if that's the case please let me know.

In summary, I did something like the following:

>>> from __future__ import print_function
>>> import numpy as np
>>> import time
>>> from math import pi
>>> from os.path import dirname, abspath, join as pjoin
>>> import Corrfunc
>>> from Corrfunc.mocks.DDtheta_mocks import find_fastest_DDtheta_mocks_bin_refs
>>> binfile = pjoin(dirname(abspath(Corrfunc.__file__)), "../mocks/tests/", "angular_bins")
>>> N = 100000
>>> nthreads = 4
>>> seed = 42
>>> np.random.seed(seed)
>>> RA1 = np.random.uniform(0.0, 2.0*pi, N)*180.0/pi
>>> cos_theta = np.random.uniform(-1.0, 1.0, N)
>>> DEC1 = 90.0 - np.arccos(cos_theta)*180.0/pi
>>> autocorr = 1
>>> find_fastest_DDtheta_mocks_bin_refs(autocorr, nthreads, binfile, RA1, DEC1, link_in_dec=True, link_in_ra=False, return_runtimes=False) 
Info: Gridding in the declination only, as link_in_dec is set to True while link_in_ra is set to False. Thus looping is only needed over the range of (min, max) bin, with refinements in the declination.
Out: (1, 3)
manodeep commented 4 years ago

@kakirastern Btw, since we dropped the nmatcher project from OpenAstronomy GSoC, it is totally fine if you do not resources to commit to this PR. If so, please let me know - I will merge this PR into a separate branch and make the necessary changes. That way, the PR is merged in, and you still get credit for the time and effort you have already committed.

kakirastern commented 4 years ago

@kakirastern Btw, since we dropped the nmatcher project from OpenAstronomy GSoC, it is totally fine if you do not resources to commit to this PR. If so, please let me know - I will merge this PR into a separate branch and make the necessary changes. That way, the PR is merged in, and you still get credit for the time and effort you have already committed.

Not a problem! I would like to continue my work on this PR until it is merged, since I am enjoying the process and am learning a lot from the experience. I will work on it a bit more according to your suggestions and get back to you soon.

manodeep commented 4 years ago

Thank you - your continued efforts on this PR are appreciated!

kakirastern commented 4 years ago

Added changelog entry for PR.

kakirastern commented 4 years ago

@kakirastern I requested one final change, but I am approving the PR regardless. Thank you so much for all your efforts!

@lgarrison Did you want to check if I have missed anything?

@manodeep You are more than welcome!

manodeep commented 4 years ago

@kakirastern Thank you for your updates. I have made a couple of changes, and once the tests pass, I will merge this PR in.

@kakirastern Thanks again for all your efforts on this PR. You are very welcome to look at other sections of the code-base that you might want to contribute to. This PR highlighted that we need a proper contributing guideline. A related (but somewhat advanced) idea would be to create a decorator function that can be applied on individual pair-counters to automagically generates this helper routine to return the faster combination. This decorator would load the pair-counting function source, and create a nested triple loop over x/y/z-binref, and call the "decorated" function (passing all the parameters through), and finally return the runtimes as requested. Such a decorator could be immediately applied to all remaining clustering statistics within Corrfunc (even the void probability functions, see #97)

Regardless of your choice, thank you again for your contribution!

kakirastern commented 4 years ago

@kakirastern Thanks again for all your efforts on this PR. You are very welcome to look at other sections of the code-base that you might want to contribute to. This PR highlighted that we need a proper contributing guideline. A related (but somewhat advanced) idea would be to create a decorator function that can be applied on individual pair-counters to automagically generates this helper routine to return the faster combination. This decorator would load the pair-counting function source, and create a nested triple loop over x/y/z-binref, and call the "decorated" function (passing all the parameters through), and finally return the runtimes as requested. Such a decorator could be immediately applied to all remaining clustering statistics within Corrfunc (even the void probability functions, see #97)

Regardless of your choice, thank you again for your contribution!

@manodeep You are welcome! Yeah, I would be interested in helping out making such a decorator function, but may need some guidance.