Closed taldcroft closed 1 year ago
This solves this specific case, but a way to fix it in general would be to do it in chandra_aca/star_probs.
One could add a private function called get_grid_func_model_
which is cached, and make get_grid_func_model
a thin wrapper around it that is not cached. The wrapper function does what you did here and passes the arguments to get_grid_func_model_
.
Which other cases are you thinking about @javierggt ? Because we're in chandra_aca/star_probs. I was thinking that another way to go would be to update the caching function to also do things like make func(), func(defaultmodel), and func(model=defaultmodel) all equivalent in the cache but figured that this fix was good enough.
I also figured it was basically in the noise for run speed but do figure for the use case of multiprocessing that extra reads in the job become a slightly more significant part of run time (than the FOT current use case of doing the star catalogs sequentially in one process).
Though I haven't checked that this fix works. By inspection I'm not sure how it would. I had been thinking the alternate of gfm = get_grid_func_model(info[“default_model”])
for the default model info lookup if that is all it is supposed to do anyway.
I wasn't thinking of any other cases. My comment was a general one. Basically, we have one function, and the calling code is doing something based on its internals, which in my opinion is not nice.
The change I propose would move the one line to massage the input into the function and would not be much more complex.
Clearly, the next step is to consider other ways in which the input arguments can be redundant, as you say, but I was not proposing that. I was proposing just to put this change in a better place, so it would not be more complex.
It's just a suggestion.
I like @javierggt 's idea and it is actually quite simple. Stand by.
BTW this is a great candidate for the Squash and merge
button. Four meandering commits for about as many code line changes. We should start squashing PR's that have excessive commits.
I'm not sure this fixes the problem anymore. It seems to me that the original fix was lost in the latest commit (there is not check for redundancy in the parameters). I would have expected a check in get_grid_func_model
before calling get_grid_func_model_
. Did you test again? I'm going to test now.
The description now has functional testing showing that it works.
get_grid_func_model()
and get_grid_func_model(None)
both call _get_grid_func_model(None, None, None)
.
right, I missed that. The argument is filled by None
by default, of course. What I just mentioned were checks we would put if we wanted something fancier, like what Jean mentioned above.
@javierggt - I think this is an excellent pattern you suggested for ensuring that function caching deals with potential redundancy due to keyword arg defaults and/or providing args vs kwargs. The first function call normalizes everything to a fixed list of args.
It isn't working for me AFAICT. I had just added print statements in ska_helpers, and I think this test should only read the probability model once (and the aca spec twice, as the limits are fetched again at the end).
In [2]: pytest.main(['-k test_optional_penalty_limit', '-vv', '-s'])
======================================================================== test session starts =========================================================================
platform darwin -- Python 3.10.8, pytest-7.2.1, pluggy-1.0.0 -- /Users/jean/miniconda3/envs/ska3/bin/python3.10
cachedir: .pytest_cache
rootdir: /Users/jean/git/proseco, configfile: pytest.ini
plugins: timeout-2.1.0, anyio-3.6.2
collected 158 items / 157 deselected / 1 selected
proseco/tests/test_catalog.py::test_optional_penalty_limit file read from /var/folders/63/ljv0yn5x01x6xf4cymvh814m0000gn/T/tmpvs422gvv/chandra_models/xija/aca/aca_spec.json
file read from /var/folders/63/ljv0yn5x01x6xf4cymvh814m0000gn/T/tmpmtum3nhn/chandra_models/aca_acq_prob
file read from /var/folders/63/ljv0yn5x01x6xf4cymvh814m0000gn/T/tmpsoe054_o/chandra_models/aca_acq_prob
file read from /var/folders/63/ljv0yn5x01x6xf4cymvh814m0000gn/T/tmpfoum667r/chandra_models/xija/aca/aca_spec.json
PASSED
================================================================= 1 passed, 157 deselected in 9.95s ==================================================================
Out[2]: <ExitCode.OK: 0>```
OK, moving too fast for me but that's fine.
In your pytest output there are three different folders (different CHANDRA_MODELS_REPO_DIR), so those should NOT be cached. I think that is a success.
But why would it read the probability model again? get_aca_catalog is only called once.
I'll just move to testing on get_aca_catalog.
@jeanconn: the model
argument is always set to None
by default within the call to get_grid_func_model
, because it is specified in the signature of the function. In the previous implementation, the parameter would be set to None
after passing through the cache decorator, which is why the cache didn't see the same list of arguments.
I still thought the fundamental issue was that at https://github.com/sot/chandra_aca/blob/bd9bafb9abeafa34657821d6defb5a1b36308a2b/chandra_aca/star_probs.py#L559 - model is not None.
I believe I just answered that, in case you missed it.
I don't understand how your answer applies for the case when model isn't None.
I guess you are focusing on the third call, with model="grid-*"
, right? I think this solution does not fix that case, and neither did the previous solution. And is not listed in the description either.
My issue is that I'm still seeing for the completely standard use case that the model is read twice
In [1]: from proseco import get_aca_catalog
In [2]: get_aca_catalog(23456)
file read from /Users/jean/ska/data/chandra_models/chandra_models/xija/aca/aca_spec.json
file read from /Users/jean/ska/data/chandra_models/chandra_models/aca_acq_prob
file read from /Users/jean/ska/data/chandra_models/chandra_models/aca_acq_prob
Out[2]:
<ACATable length=11>
slot idx id type sz p_acq mag maxmag yang zang dim res halfw
int64 int64 int64 str3 str3 float64 float64 float64 float64 float64 int64 int64 int64
----- ----- --------- ---- ---- ------- ------- ------- -------- -------- ----- ----- -----
0 1 2 FID 8x8 0.000 7.00 8.00 -773.20 -1742.03 1 1 25
1 2 4 FID 8x8 0.000 7.00 8.00 2140.23 166.63 1 1 25
2 3 5 FID 8x8 0.000 7.00 8.00 -1826.28 160.17 1 1 25
3 4 670829768 BOT 8x8 0.977 7.51 9.01 1689.22 1326.24 28 1 160
4 5 670317320 BOT 8x8 0.976 7.94 9.44 1988.29 539.09 28 1 160
5 6 670312032 BOT 8x8 0.973 8.13 9.63 -1066.82 -1513.73 28 1 160
6 7 670302264 BOT 8x8 0.971 8.21 9.71 -2277.15 1868.08 28 1 160
7 8 670307416 BOT 8x8 0.939 8.95 10.45 -2138.51 1195.58 28 1 160
0 9 670317248 ACQ 8x8 0.928 9.08 10.58 -1090.23 -1853.70 28 1 160
1 10 670302816 ACQ 8x8 0.931 9.19 10.69 -1946.22 1927.59 20 1 120
2 11 670318128 ACQ 8x8 0.756 9.84 11.09 250.49 632.72 28 1 160
That's what I suggested to fix on slack and thought that this PR was trying to solve.
file read from /Users/jean/ska/data/chandra_models/chandra_models/aca_acq_prob file read from /Users/jean/ska/data/chandra_models/chandra_models/aca_acq_prob
Humm. Your pytest result is completely expected, but this one surprises me.
I don't disagree (with myself!) that it would also be great to normalize these cache args etc, but I thought https://github.com/sot/chandra_aca/blob/bd9bafb9abeafa34657821d6defb5a1b36308a2b/chandra_aca/star_probs.py#L281 made this awkward, which is why I started by suggesting the fix for the function to get the default info, as it was basically doing this same thing.
I also don't see where the pytest result is completely expected since the acq probs were only "asked for" by the get_aca_catalog once. I just figured they cloned again because of the different args (that should not be different) but I'll need to look at that in more detail.
so... the first time, it is called with model="grid-*"
and the second time with model=None
.
The question is whether at this level (within chandra_aca.star_probs
) one can know that these two calls will be the same, and whether that can change during the run (e.g. by changing an entry in os.environ
)
My point was that if at https://github.com/sot/chandra_aca/blob/bd9bafb9abeafa34657821d6defb5a1b36308a2b/chandra_aca/star_probs.py#L359 you are already trying to get the default, that the pass thru to get_grid_func_model could just be info["default_model"] . It feels consistent with what the function is trying to do anyway.
But maybe there's a smarter way that is still consistent, or just cut the model assignment at https://github.com/sot/chandra_aca/blob/bd9bafb9abeafa34657821d6defb5a1b36308a2b/chandra_aca/star_probs.py#L280
My first inclination is to move (or copy) the check
if model is None:
model = conf.default_model
into the wrapping function
Yeah, something like that probably works better to get the labels that I think were the point of the assignment.
With that change, I get this for the functional tests above:
In [1]: from chandra_aca import star_probs
In [2]: %time model = star_probs.get_grid_func_model()
CPU times: user 17.9 ms, sys: 2.05 ms, total: 19.9 ms
Wall time: 280 ms
In [3]: %time model = star_probs.get_grid_func_model(None)
CPU times: user 59 µs, sys: 0 ns, total: 59 µs
Wall time: 66.3 µs
In [4]: %time model = star_probs.get_grid_func_model("grid-*")
CPU times: user 21 µs, sys: 3 µs, total: 24 µs
Wall time: 29.6 µs
Description
In default usage @jeanconn noted that
get_grid_func_model
was being called once withget_grid_func_model()
and once withget_grid_func_model(None)
, thus requiring two reads of the model file. This fixes that.Interface impacts
None
Testing
Unit tests
Independent check of unit tests by [REVIEWER NAME]
Functional tests