Open manodeep opened 8 years ago
Hi @manodeep I would like to try out this issue, to get myself familiar with this repo
But first of all due to the date of the original post (from 2016), I would like to check if this is still an issue.
Some background: (I have been successful in installing the software to my MacBook with clang8 I installed after purchasing the laptop, and have tested the software on it, so no issue there.) As an aside, I am looking for an issue to prepare me for the n-way matcher idea for GSoC this year. As I am new to this repo, I am thinking if this is still open then I can begin working on it.
Hi @kakirastern - sure thing - please go ahead. I would recommend creating one for the Corrfunc/mocks/DDtheta_mocks
to get yourself familiar with calculations with on-sky angular separations.
CC @parejkoj @bsipocz
Thanks @manodeep! I will begin work on the issue this weekend.
@manodeep I would like to follow up on this issue with additional PRs for the remaining extensions and for the implementation of the proposed decorator.
Also, would like to know if there is any recommended way to approach each of the remaining tasks in this issue? That is, is there any recommended order for their completion?
@kakirastern Thanks for your continued interest - your contributions are very welcome.
Let's go for the advanced method - that will have several benefits:
Since this contribution will be a little more advanced, I am happy to guide you through the process. For starters, here is a primer on decorators.
Thanks @manodeep! Let me go through the primer on decorators and get back to you over the next few days.
Hi @manodeep I should be ready to continue again tomorrow. Been sidetracked by something cropped up suddenly.
Thank you for letting me know. Just in case this is not clear, you are doing us a favour - so please feel free to work at it on your own timescale (never mind the whole pandemic going on in the world!)
Thanks @manodeep! The personal matter has been resolved now so I can get back to this issue. I have read the primer on decorators you have suggested earlier. I really like the idea of using decorators as an advanced method to approach the task outlined above. What would you recommend me doing next?
@kakirastern Can you come up with a strategy (using decorators) that loops over the relevant angular/spatial dimensions and then finds the fastest runtime? Perhaps create some toy "decorated" functions to get a better idea about the functionality?
Think of this issue as tackling a research problem - solving it will require some exploration but there should be a solution at the end :)
Sure @manodeep Give me some time to think about the strategy and I will get back to you soon.
Hi @manodeep I have experimented using decorators with modification of the docstrings... So will move on to testing with the code part next.
Great! Would it be easier if you open up a gist (gist.github.com/) with a Jupyter notebook? That way wemight be able to help (or ask for help as the case may be)?
Sure, let me figure out how to use gist
and report back shortly.
Am thinking about a way things would work in the context of a package though, hence it's taking more time...
Thanks for the update. Please let me know if I can help with the planning/design phase...
Sure! I have been experimenting using decorators in some other Python packages... Turned out it can get quite intricate and more nuanced then the impression I got from reading the reference article alone. It took awhile for me to really get the gist of the idea behind. But I will definitely start some notebook this weekend.
So the tentative plan I have come up with is to use a decorator to add some (common) code to each function in the list above, then add some extra code to complete the functionality for each application of the decorator... At least that's the idea I have for the project for now.
Sounds good. For starters, do you mind setting up a jupyter notebook containing these three functions:
xyz_binrefs
ra/dec_binrefs
xyz_binrefs
or ra/dec_binrefs
), and then runs a triple nested for-loop or double nested for-loop, while passing through all relevant keywords to the actual functionIf you set up a gist, then you can use binder to run jupyter notebooks on the cloud
Sure, will get to it as soon as possible.
Hi @manodeep! After much trial and error I have drafted a version of a decorator that accepts a function, checks whether the function is with the xyz_binrefs
arg or is one with the ra_dec_binrefs
arg (couldn't use the slash you have used due to some syntax issue), as well as prints out all args defined in the actual function.
The gist for that work is at: https://gist.github.com/kakirastern/a905c021697200115d46e91fe95add0d.
Do let me know what you think. Thanks!
@kakirastern Great work - that's fantastic.
Can you try with each of DD, DDrppi, DDsmu, wp, xi, vpf
and DDrppi_mocks, DDsmu_mocks, DDtheta_mocks, vpf_mocks
and probe whether each one of those functions contain xbinref/ybinref/zbinref
or ra_binref/dec_binref
?
The idea would be to automatically wrap a triple-for-loop (or some form of nd-iter) around the wrapped function and automatically figure out the fastest bin-ref (i.e., replace the functionsfind_fastest_DDtheta_mocks_bin_refs
, find_fastest_wp_bin_refs
with an automatic decorated function)
Sure @manodeep Will try each of DD, DDrppi, DDsmu, wp, xi, vpf
and DDrppi_mocks, DDsmu_mocks, DDtheta_mocks, vpf_mocks
and see whether each of these contain either xbinref/ybinref/zbinref
or ra_binref/dec_binref
using the approach you have suggested, then get back to you soon.
Perfect! And you can test that the output of this decorated function matches the output from the existing find_fastest_wp*
and the find_fastest_DDtheta*
routines.
Thanks again - this looks awesome!
@manodeep Apologies for the long silence, will get back to you within this week
All good :)
I did mean to ask - how did you setup the gist to connect the google-colab notebook? Looks very nice
Oh, they have added this new option "Save a copy as GitHub Gist" under "File" in the navigation bar of Google Colab so if I opt to work there I can make a Gist simply with the click of a button.
Thanks!
Can you try with each of
DD, DDrppi, DDsmu, wp, xi, vpf
andDDrppi_mocks, DDsmu_mocks, DDtheta_mocks, vpf_mocks
and probe whether each one of those functions containxbinref/ybinref/zbinref
orra_binref/dec_binref
?
Hi @manodeep
So I have tried the same on each of DD, DDrppi, DDsmu, wp, xi, vpf
as well as DDrppi_mocks, DDsmu_mocks, DDtheta_mocks, vpf_mocks
as suggested, and have discovered that I needed to tweak the code a little to make things work, because in sometimes X
, Y
, Z
are used and sometimes X1
, Y1
, Z1
, depending whether X2
, Y2
, Z2
. The same goes for the RA
and Dec
bin refs.
For the gist
please see https://gist.github.com/kakirastern/2c01541b132e07092c63d9e6535061cc.
Awesome - that looks great!
I have some suggestions regarding how to make the code a bit more compact, but you have already solved the first part for detecting what kind of a function is being decorated. Great work :)
For the outermost decorator factory function, I would recommend accepting max_xbin_ref
, max_ybin_ref
, and max_zbin_ref
arguments. That will help you set up the loop over the bin_refs later on within the decorator function.
Regarding the naming, when a function only accepts a single dataset, e.g., theory.wp
, theory.xi
and theory.vpf
, then the arguments are called X/Y/Z
, whereas when a function can accept two datasets, then the arguments are called X1/Y1/Z1
and X2/Y2/Z2
. For the mocks
, only vpf_mocks
works with a single dataset, and hence the inputs are called RA/DEC/CZ
; all other functions under mocks
should have RA1/DEC1/CZ1
. We can reasonably expect to adhere to such a naming convention for any future functions. If you really want to cover those cases as well, then I would recommend raising an error (e.g., if the function has X2/Y2/Z2
but no X1/Y1/Z1
and so on...).
Do you have a way forward for writing the loops over bin_refine_factors
and calling the decorated function?
Yeah, sure @manodeep I can definitely do that.
Regarding the last question, I have actually been thinking about it. Do you reckon it would be a good idea / is a good time to start testing some possible implementation in the repo itself?
@kakirastern Of course! Just fork the repo and start adding the code. The best place to add the new code would probably be the Corrfunc.utils
(sub)package
Thanks @manodeep Then I will add the code and experiment with it in a new branch on my fork as you have suggested... GSoC will be over for me soon, so I will have more time to work on this issue moving forward.
GSoC is finally over for me. I will resume work on this issue / the proposed PR this weekend.
Hope GSoC was a good experience! Please take your time, no rush here
Thanks @manodeep! GSoC has been very fruitful and fun this year for me, I have been fortunate to work with SunPy again as a Student Developer. Also, I have resumed working on this issue, and will have something to show over the next week or so.
Just ditched some last-minute commitments so that I can finally finish up what I started here @manodeep Will keep in touch and keep working until the end say in a month's time for the PR. Thanks for your patience!
All good - please ensure your wellbeing/commitments. This can wait
Hey got sidetracked again by a 2-week long fever... Great it is over now. Will resume work next week for certain.
This time I will keep my word
@kakirastern Of course! Just fork the repo and start adding the code. The best place to add the new code would probably be the
Corrfunc.utils
(sub)package
Should be straightforward enough once I follow up on this hint...
Glad you are feeling better.
Think the following steps would be required:
utils.py
file That should be all. Happy to clarify and help along the way
Thanks, will start after my NDCube PR has been safely merged...
NDCube PR safely merged after one entire week of "wrangling". Will get on with this PR tomorrow, and be careful with the documentation to be added.
On it tonight
Essentially the same as #91 as fixed in commit 3478f63. That commit simply needs to be copied over to all other extensions. Would be an ideal issue to solve for a newcomer to the repo.
Theory
DD
DDrppi
DDsmu
wp
xi
vpf
Mocks
DDrppi_mocks
DDtheta_mocks
[#216]DDsmu_mocks
vpf_mocks
For Advanced Users
x/y/z-binref
and runs the underlying function (passing all the parameters through). The only parameters for this decorator would be themaxbinref
,nrepeats
andreturn_runtimes
. This decorator can then be applied to all remaining clustering statistics (i.e., the functions without dedicated helper routines)