manodeep / Corrfunc

⚡️⚡️⚡️Blazing fast correlation functions on the CPU.
https://corrfunc.readthedocs.io
MIT License
166 stars 51 forks source link

Python utility to find the fastest combination of bin refine factors #97

Open manodeep opened 8 years ago

manodeep commented 8 years ago

Essentially the same as #91 as fixed in commit 3478f63. That commit simply needs to be copied over to all other extensions. Would be an ideal issue to solve for a newcomer to the repo.

Theory

Mocks

For Advanced Users

kakirastern commented 4 years ago

Hi @manodeep I would like to try out this issue, to get myself familiar with this repo

kakirastern commented 4 years ago

But first of all due to the date of the original post (from 2016), I would like to check if this is still an issue.

Some background: (I have been successful in installing the software to my MacBook with clang8 I installed after purchasing the laptop, and have tested the software on it, so no issue there.) As an aside, I am looking for an issue to prepare me for the n-way matcher idea for GSoC this year. As I am new to this repo, I am thinking if this is still open then I can begin working on it.

manodeep commented 4 years ago

Hi @kakirastern - sure thing - please go ahead. I would recommend creating one for the Corrfunc/mocks/DDtheta_mocks to get yourself familiar with calculations with on-sky angular separations.

manodeep commented 4 years ago

CC @parejkoj @bsipocz

kakirastern commented 4 years ago

Thanks @manodeep! I will begin work on the issue this weekend.

kakirastern commented 4 years ago

@manodeep I would like to follow up on this issue with additional PRs for the remaining extensions and for the implementation of the proposed decorator.

kakirastern commented 4 years ago

Also, would like to know if there is any recommended way to approach each of the remaining tasks in this issue? That is, is there any recommended order for their completion?

manodeep commented 4 years ago

@kakirastern Thanks for your continued interest - your contributions are very welcome.

Let's go for the advanced method - that will have several benefits:

Since this contribution will be a little more advanced, I am happy to guide you through the process. For starters, here is a primer on decorators.

kakirastern commented 4 years ago

Thanks @manodeep! Let me go through the primer on decorators and get back to you over the next few days.

kakirastern commented 4 years ago

Hi @manodeep I should be ready to continue again tomorrow. Been sidetracked by something cropped up suddenly.

manodeep commented 4 years ago

Thank you for letting me know. Just in case this is not clear, you are doing us a favour - so please feel free to work at it on your own timescale (never mind the whole pandemic going on in the world!)

kakirastern commented 4 years ago

Thanks @manodeep! The personal matter has been resolved now so I can get back to this issue. I have read the primer on decorators you have suggested earlier. I really like the idea of using decorators as an advanced method to approach the task outlined above. What would you recommend me doing next?

manodeep commented 4 years ago

@kakirastern Can you come up with a strategy (using decorators) that loops over the relevant angular/spatial dimensions and then finds the fastest runtime? Perhaps create some toy "decorated" functions to get a better idea about the functionality?

Think of this issue as tackling a research problem - solving it will require some exploration but there should be a solution at the end :)

kakirastern commented 4 years ago

Sure @manodeep Give me some time to think about the strategy and I will get back to you soon.

kakirastern commented 4 years ago

Hi @manodeep I have experimented using decorators with modification of the docstrings... So will move on to testing with the code part next.

manodeep commented 4 years ago

Great! Would it be easier if you open up a gist (gist.github.com/) with a Jupyter notebook? That way wemight be able to help (or ask for help as the case may be)?

kakirastern commented 4 years ago

Sure, let me figure out how to use gist and report back shortly.

kakirastern commented 4 years ago

Am thinking about a way things would work in the context of a package though, hence it's taking more time...

manodeep commented 4 years ago

Thanks for the update. Please let me know if I can help with the planning/design phase...

kakirastern commented 4 years ago

Sure! I have been experimenting using decorators in some other Python packages... Turned out it can get quite intricate and more nuanced then the impression I got from reading the reference article alone. It took awhile for me to really get the gist of the idea behind. But I will definitely start some notebook this weekend.

kakirastern commented 4 years ago

So the tentative plan I have come up with is to use a decorator to add some (common) code to each function in the list above, then add some extra code to complete the functionality for each application of the decorator... At least that's the idea I have for the project for now.

manodeep commented 4 years ago

Sounds good. For starters, do you mind setting up a jupyter notebook containing these three functions:

If you set up a gist, then you can use binder to run jupyter notebooks on the cloud

kakirastern commented 4 years ago

Sure, will get to it as soon as possible.

kakirastern commented 4 years ago

Hi @manodeep! After much trial and error I have drafted a version of a decorator that accepts a function, checks whether the function is with the xyz_binrefs arg or is one with the ra_dec_binrefs arg (couldn't use the slash you have used due to some syntax issue), as well as prints out all args defined in the actual function.

The gist for that work is at: https://gist.github.com/kakirastern/a905c021697200115d46e91fe95add0d.

Do let me know what you think. Thanks!

manodeep commented 4 years ago

@kakirastern Great work - that's fantastic.

Can you try with each of DD, DDrppi, DDsmu, wp, xi, vpf and DDrppi_mocks, DDsmu_mocks, DDtheta_mocks, vpf_mocks and probe whether each one of those functions contain xbinref/ybinref/zbinref or ra_binref/dec_binref?

The idea would be to automatically wrap a triple-for-loop (or some form of nd-iter) around the wrapped function and automatically figure out the fastest bin-ref (i.e., replace the functionsfind_fastest_DDtheta_mocks_bin_refs, find_fastest_wp_bin_refs with an automatic decorated function)

kakirastern commented 4 years ago

Sure @manodeep Will try each of DD, DDrppi, DDsmu, wp, xi, vpf and DDrppi_mocks, DDsmu_mocks, DDtheta_mocks, vpf_mocks and see whether each of these contain either xbinref/ybinref/zbinref or ra_binref/dec_binref using the approach you have suggested, then get back to you soon.

manodeep commented 4 years ago

Perfect! And you can test that the output of this decorated function matches the output from the existing find_fastest_wp* and the find_fastest_DDtheta* routines.

Thanks again - this looks awesome!

kakirastern commented 4 years ago

@manodeep Apologies for the long silence, will get back to you within this week

manodeep commented 4 years ago

All good :)

I did mean to ask - how did you setup the gist to connect the google-colab notebook? Looks very nice

kakirastern commented 4 years ago

Oh, they have added this new option "Save a copy as GitHub Gist" under "File" in the navigation bar of Google Colab so if I opt to work there I can make a Gist simply with the click of a button.

Screenshot 2020-07-31 05 19 56
manodeep commented 4 years ago

Thanks!

kakirastern commented 4 years ago

Can you try with each of DD, DDrppi, DDsmu, wp, xi, vpf and DDrppi_mocks, DDsmu_mocks, DDtheta_mocks, vpf_mocks and probe whether each one of those functions contain xbinref/ybinref/zbinref or ra_binref/dec_binref?

Hi @manodeep

So I have tried the same on each of DD, DDrppi, DDsmu, wp, xi, vpf as well as DDrppi_mocks, DDsmu_mocks, DDtheta_mocks, vpf_mocks as suggested, and have discovered that I needed to tweak the code a little to make things work, because in sometimes X, Y, Z are used and sometimes X1, Y1, Z1, depending whether X2, Y2, Z2. The same goes for the RA and Dec bin refs.

For the gist please see https://gist.github.com/kakirastern/2c01541b132e07092c63d9e6535061cc.

manodeep commented 4 years ago

Awesome - that looks great!

I have some suggestions regarding how to make the code a bit more compact, but you have already solved the first part for detecting what kind of a function is being decorated. Great work :)

For the outermost decorator factory function, I would recommend accepting max_xbin_ref, max_ybin_ref, and max_zbin_ref arguments. That will help you set up the loop over the bin_refs later on within the decorator function.

Regarding the naming, when a function only accepts a single dataset, e.g., theory.wp, theory.xi and theory.vpf, then the arguments are called X/Y/Z, whereas when a function can accept two datasets, then the arguments are called X1/Y1/Z1 and X2/Y2/Z2. For the mocks, only vpf_mocks works with a single dataset, and hence the inputs are called RA/DEC/CZ; all other functions under mocks should have RA1/DEC1/CZ1. We can reasonably expect to adhere to such a naming convention for any future functions. If you really want to cover those cases as well, then I would recommend raising an error (e.g., if the function has X2/Y2/Z2 but no X1/Y1/Z1 and so on...).

Do you have a way forward for writing the loops over bin_refine_factors and calling the decorated function?

kakirastern commented 4 years ago

Yeah, sure @manodeep I can definitely do that.

Regarding the last question, I have actually been thinking about it. Do you reckon it would be a good idea / is a good time to start testing some possible implementation in the repo itself?

manodeep commented 4 years ago

@kakirastern Of course! Just fork the repo and start adding the code. The best place to add the new code would probably be the Corrfunc.utils (sub)package

kakirastern commented 4 years ago

Thanks @manodeep Then I will add the code and experiment with it in a new branch on my fork as you have suggested... GSoC will be over for me soon, so I will have more time to work on this issue moving forward.

kakirastern commented 4 years ago

GSoC is finally over for me. I will resume work on this issue / the proposed PR this weekend.

manodeep commented 4 years ago

Hope GSoC was a good experience! Please take your time, no rush here

kakirastern commented 4 years ago

Thanks @manodeep! GSoC has been very fruitful and fun this year for me, I have been fortunate to work with SunPy again as a Student Developer. Also, I have resumed working on this issue, and will have something to show over the next week or so.

kakirastern commented 4 years ago

Just ditched some last-minute commitments so that I can finally finish up what I started here @manodeep Will keep in touch and keep working until the end say in a month's time for the PR. Thanks for your patience!

manodeep commented 4 years ago

All good - please ensure your wellbeing/commitments. This can wait

kakirastern commented 3 years ago

Hey got sidetracked again by a 2-week long fever... Great it is over now. Will resume work next week for certain.

kakirastern commented 3 years ago

This time I will keep my word

kakirastern commented 3 years ago

@kakirastern Of course! Just fork the repo and start adding the code. The best place to add the new code would probably be the Corrfunc.utils (sub)package

Should be straightforward enough once I follow up on this hint...

manodeep commented 3 years ago

Glad you are feeling better.

Think the following steps would be required:

That should be all. Happy to clarify and help along the way

kakirastern commented 3 years ago

Thanks, will start after my NDCube PR has been safely merged...

kakirastern commented 3 years ago

NDCube PR safely merged after one entire week of "wrangling". Will get on with this PR tomorrow, and be careful with the documentation to be added.

kakirastern commented 3 years ago

On it tonight