sot / chandra_aca

Chandra Aspect Camera Tools
https://sot.github.io/chandra_aca
BSD 2-Clause "Simplified" License
0 stars 0 forks source link

Improve caching performance by adjusting default call pattern #153

Closed taldcroft closed 1 year ago

taldcroft commented 1 year ago

Description

In default usage @jeanconn noted that get_grid_func_model was being called once with get_grid_func_model() and once with get_grid_func_model(None), thus requiring two reads of the model file. This fixes that.

Interface impacts

None

Testing

Unit tests

Independent check of unit tests by [REVIEWER NAME]

Functional tests

In [1]: from chandra_aca import star_probs

In [2]: %time model = star_probs.get_grid_func_model()
CPU times: user 23.5 ms, sys: 24.6 ms, total: 48.2 ms
Wall time: 82.8 ms

In [3]: %time model = star_probs.get_grid_func_model(None)
CPU times: user 27 µs, sys: 5 µs, total: 32 µs
Wall time: 34.8 µs

In [4]: %time model = star_probs.get_grid_func_model("grid-*")
CPU times: user 20.4 ms, sys: 18.4 ms, total: 38.8 ms
Wall time: 68.2 ms
javierggt commented 1 year ago

This solves this specific case, but a way to fix it in general would be to do it in chandra_aca/star_probs.

One could add a private function called get_grid_func_model_ which is cached, and make get_grid_func_model a thin wrapper around it that is not cached. The wrapper function does what you did here and passes the arguments to get_grid_func_model_.

jeanconn commented 1 year ago

Which other cases are you thinking about @javierggt ? Because we're in chandra_aca/star_probs. I was thinking that another way to go would be to update the caching function to also do things like make func(), func(defaultmodel), and func(model=defaultmodel) all equivalent in the cache but figured that this fix was good enough.

I also figured it was basically in the noise for run speed but do figure for the use case of multiprocessing that extra reads in the job become a slightly more significant part of run time (than the FOT current use case of doing the star catalogs sequentially in one process).

jeanconn commented 1 year ago

Though I haven't checked that this fix works. By inspection I'm not sure how it would. I had been thinking the alternate of gfm = get_grid_func_model(info[“default_model”]) for the default model info lookup if that is all it is supposed to do anyway.

javierggt commented 1 year ago

I wasn't thinking of any other cases. My comment was a general one. Basically, we have one function, and the calling code is doing something based on its internals, which in my opinion is not nice.

The change I propose would move the one line to massage the input into the function and would not be much more complex.

Clearly, the next step is to consider other ways in which the input arguments can be redundant, as you say, but I was not proposing that. I was proposing just to put this change in a better place, so it would not be more complex.

It's just a suggestion.

taldcroft commented 1 year ago

I like @javierggt 's idea and it is actually quite simple. Stand by.

taldcroft commented 1 year ago

BTW this is a great candidate for the Squash and merge button. Four meandering commits for about as many code line changes. We should start squashing PR's that have excessive commits.

javierggt commented 1 year ago

I'm not sure this fixes the problem anymore. It seems to me that the original fix was lost in the latest commit (there is not check for redundancy in the parameters). I would have expected a check in get_grid_func_model before calling get_grid_func_model_. Did you test again? I'm going to test now.

taldcroft commented 1 year ago

The description now has functional testing showing that it works.

taldcroft commented 1 year ago

get_grid_func_model() and get_grid_func_model(None) both call _get_grid_func_model(None, None, None).

javierggt commented 1 year ago

right, I missed that. The argument is filled by None by default, of course. What I just mentioned were checks we would put if we wanted something fancier, like what Jean mentioned above.

taldcroft commented 1 year ago

@javierggt - I think this is an excellent pattern you suggested for ensuring that function caching deals with potential redundancy due to keyword arg defaults and/or providing args vs kwargs. The first function call normalizes everything to a fixed list of args.

jeanconn commented 1 year ago

It isn't working for me AFAICT. I had just added print statements in ska_helpers, and I think this test should only read the probability model once (and the aca spec twice, as the limits are fetched again at the end).


In [2]: pytest.main(['-k test_optional_penalty_limit', '-vv', '-s'])
======================================================================== test session starts =========================================================================
platform darwin -- Python 3.10.8, pytest-7.2.1, pluggy-1.0.0 -- /Users/jean/miniconda3/envs/ska3/bin/python3.10
cachedir: .pytest_cache
rootdir: /Users/jean/git/proseco, configfile: pytest.ini
plugins: timeout-2.1.0, anyio-3.6.2
collected 158 items / 157 deselected / 1 selected                                                                                                                    

proseco/tests/test_catalog.py::test_optional_penalty_limit file read from /var/folders/63/ljv0yn5x01x6xf4cymvh814m0000gn/T/tmpvs422gvv/chandra_models/xija/aca/aca_spec.json
file read from /var/folders/63/ljv0yn5x01x6xf4cymvh814m0000gn/T/tmpmtum3nhn/chandra_models/aca_acq_prob
file read from /var/folders/63/ljv0yn5x01x6xf4cymvh814m0000gn/T/tmpsoe054_o/chandra_models/aca_acq_prob
file read from /var/folders/63/ljv0yn5x01x6xf4cymvh814m0000gn/T/tmpfoum667r/chandra_models/xija/aca/aca_spec.json
PASSED

================================================================= 1 passed, 157 deselected in 9.95s ==================================================================
Out[2]: <ExitCode.OK: 0>```
jeanconn commented 1 year ago

OK, moving too fast for me but that's fine.

taldcroft commented 1 year ago

In your pytest output there are three different folders (different CHANDRA_MODELS_REPO_DIR), so those should NOT be cached. I think that is a success.

jeanconn commented 1 year ago

But why would it read the probability model again? get_aca_catalog is only called once.

jeanconn commented 1 year ago

I'll just move to testing on get_aca_catalog.

javierggt commented 1 year ago

@jeanconn: the model argument is always set to None by default within the call to get_grid_func_model, because it is specified in the signature of the function. In the previous implementation, the parameter would be set to None after passing through the cache decorator, which is why the cache didn't see the same list of arguments.

jeanconn commented 1 year ago

I still thought the fundamental issue was that at https://github.com/sot/chandra_aca/blob/bd9bafb9abeafa34657821d6defb5a1b36308a2b/chandra_aca/star_probs.py#L559 - model is not None.

javierggt commented 1 year ago

I believe I just answered that, in case you missed it.

jeanconn commented 1 year ago

I don't understand how your answer applies for the case when model isn't None.

javierggt commented 1 year ago

I guess you are focusing on the third call, with model="grid-*", right? I think this solution does not fix that case, and neither did the previous solution. And is not listed in the description either.

jeanconn commented 1 year ago

My issue is that I'm still seeing for the completely standard use case that the model is read twice

In [1]: from proseco import get_aca_catalog

In [2]: get_aca_catalog(23456)
file read from /Users/jean/ska/data/chandra_models/chandra_models/xija/aca/aca_spec.json
file read from /Users/jean/ska/data/chandra_models/chandra_models/aca_acq_prob
file read from /Users/jean/ska/data/chandra_models/chandra_models/aca_acq_prob
Out[2]: 
<ACATable length=11>
 slot  idx      id    type  sz   p_acq    mag    maxmag   yang     zang    dim   res  halfw
int64 int64   int64   str3 str3 float64 float64 float64 float64  float64  int64 int64 int64
----- ----- --------- ---- ---- ------- ------- ------- -------- -------- ----- ----- -----
    0     1         2  FID  8x8   0.000    7.00    8.00  -773.20 -1742.03     1     1    25
    1     2         4  FID  8x8   0.000    7.00    8.00  2140.23   166.63     1     1    25
    2     3         5  FID  8x8   0.000    7.00    8.00 -1826.28   160.17     1     1    25
    3     4 670829768  BOT  8x8   0.977    7.51    9.01  1689.22  1326.24    28     1   160
    4     5 670317320  BOT  8x8   0.976    7.94    9.44  1988.29   539.09    28     1   160
    5     6 670312032  BOT  8x8   0.973    8.13    9.63 -1066.82 -1513.73    28     1   160
    6     7 670302264  BOT  8x8   0.971    8.21    9.71 -2277.15  1868.08    28     1   160
    7     8 670307416  BOT  8x8   0.939    8.95   10.45 -2138.51  1195.58    28     1   160
    0     9 670317248  ACQ  8x8   0.928    9.08   10.58 -1090.23 -1853.70    28     1   160
    1    10 670302816  ACQ  8x8   0.931    9.19   10.69 -1946.22  1927.59    20     1   120
    2    11 670318128  ACQ  8x8   0.756    9.84   11.09   250.49   632.72    28     1   160

That's what I suggested to fix on slack and thought that this PR was trying to solve.

taldcroft commented 1 year ago

file read from /Users/jean/ska/data/chandra_models/chandra_models/aca_acq_prob file read from /Users/jean/ska/data/chandra_models/chandra_models/aca_acq_prob

Humm. Your pytest result is completely expected, but this one surprises me.

jeanconn commented 1 year ago

I don't disagree (with myself!) that it would also be great to normalize these cache args etc, but I thought https://github.com/sot/chandra_aca/blob/bd9bafb9abeafa34657821d6defb5a1b36308a2b/chandra_aca/star_probs.py#L281 made this awkward, which is why I started by suggesting the fix for the function to get the default info, as it was basically doing this same thing.

jeanconn commented 1 year ago

I also don't see where the pytest result is completely expected since the acq probs were only "asked for" by the get_aca_catalog once. I just figured they cloned again because of the different args (that should not be different) but I'll need to look at that in more detail.

javierggt commented 1 year ago

so... the first time, it is called with model="grid-*" and the second time with model=None.

The question is whether at this level (within chandra_aca.star_probs) one can know that these two calls will be the same, and whether that can change during the run (e.g. by changing an entry in os.environ)

jeanconn commented 1 year ago

My point was that if at https://github.com/sot/chandra_aca/blob/bd9bafb9abeafa34657821d6defb5a1b36308a2b/chandra_aca/star_probs.py#L359 you are already trying to get the default, that the pass thru to get_grid_func_model could just be info["default_model"] . It feels consistent with what the function is trying to do anyway.

jeanconn commented 1 year ago

But maybe there's a smarter way that is still consistent, or just cut the model assignment at https://github.com/sot/chandra_aca/blob/bd9bafb9abeafa34657821d6defb5a1b36308a2b/chandra_aca/star_probs.py#L280

javierggt commented 1 year ago

My first inclination is to move (or copy) the check

if model is None: 
    model = conf.default_model

into the wrapping function

jeanconn commented 1 year ago

Yeah, something like that probably works better to get the labels that I think were the point of the assignment.

javierggt commented 1 year ago

With that change, I get this for the functional tests above:

In [1]: from chandra_aca import star_probs

In [2]: %time model = star_probs.get_grid_func_model()
CPU times: user 17.9 ms, sys: 2.05 ms, total: 19.9 ms
Wall time: 280 ms

In [3]: %time model = star_probs.get_grid_func_model(None)
CPU times: user 59 µs, sys: 0 ns, total: 59 µs
Wall time: 66.3 µs

In [4]: %time model = star_probs.get_grid_func_model("grid-*")
CPU times: user 21 µs, sys: 3 µs, total: 24 µs
Wall time: 29.6 µs