Open EngieAntoine opened 1 year ago
Thanks for the report. It's odd that you can't push, I just checked and you still have write access (and I didn't change anything).
I finally succeed to push on [test_autotrophic_cache_refresh] branch
Hi! I guess the point is that the test is failing ? If so it should appears in it (with a context manager to catch the error) Also the trace of the cached and no-cached series should also be exposed.
Or maybe I am missing something?
no the point is that this test must not failed and a fix must be done for this.
OK, then, but even to enlight a bug, the test must be green at each commit. This is why we write something like that: https://github.com/pythonianfr/tshistory_refinery/blob/b3614e428fe4ec825caf34e7334b144b6e511b03/test/test_api.py#L630
And the next commit, that fix the bug will also change the test to remove the error catching mechanism.
Furthermore, the test should ideally show what kind of difference there are between the two series
Indeed slice + today creates, even for the formula whose underlying series has only 1 rev, a formula with infinitely many latent revisions. What you are saying is that the cache should make these revisions explicit (according to the revdate rule). I think this can by tackled by detecting the presence of, at least initially, slice + a today expression, and imperatively computing all the versions.
For my curiosity: are you trying to build some pseudo-versioning on a non-versioned series?
heu no. Just trying to understand why some Cronos formulas don't be refresh correctly in the cache system. And this is the case for most of them because using the slice operator with to_value_date=today
Out of curiosity, what is the purpose of those slices ? Is Adrien tweaking some priorities ?
I don't know. Have to check with him : it's existing series.
by the way, what do you mean by "this can by tackled by detecting the presence of, at least initially, slice + a today expression, and imperatively computing all the versions." ? how can we do that ?
by the way, what do you mean by "this can by tackled by detecting the presence of, at least initially, slice + a today expression, and imperatively computing all the versions." ? how can we do that ?
In refresh_series
we prune the insertion dates range using reduce_frequency
. We probably want to skip this step if there is a slice + today. We have the formula at hand. A quick and dirty check would be to do a substring check (e.g. "(today)" in formula
). A tree walk would be more robust.
I have pushed a proposed fix in your branch.
ok thks. I will test it asap.
so, it's works in most case... but on insertion date of today, I've got a valuedate with cached version and not with non cached version.
so I confirmed that the correction for the slice is not correct : if we have a formula like :
slice (cronos "77eb0f50-e3ce-4088-b3e0-d18001f6172c") #:todate (shifted (today) #:days -1))
it will not slice the formula and we will have values for today with the cached version.
by the way, I wonder if the problem don't come from the fact that (today) is not a real today but a now...
Ok, interesting example, thanks.
I am rephrasing to check my understanding:
In the last version of the cache you have a value date corresponding at "today", but you wanted the last 24 hours to be missing, am I right ?
Could you check with an .history() call that the last values were inserted in the last refresh of the cache?
Is this cache running for a few days ? What are its parameters?
By reading the code, both of singularity_formula and refinery.cache, I don't see a glaring error in the way we handle the bounds, and hence I don't have an explanation for this bug.
An access to a preprod or a test (difficult to write because of the autotrophic nature of the operator) is needed to go further on my side.
A few additional information might help, such as the result of a print of the cron_range in the tshistory_refinery.cache.refresh_series function
Well, or a test that exhibits the issue (it will have to be written anyway).
here, the result of series that containsslice (cronos "77eb0f50-e3ce-4088-b3e0-d18001f6172c") #:todate (shifted (today) #:days -1))
:
I will try to make an unit test to exhibits the issue tomorrow
Hello, I pushed a new branch "bugfix/cached_slice" containing test_cache_slice2 which show that there is a bug with the slice and cache. Can you have a look at it please ?
Hello Amin thank you for you contribution!
I have some things to say about your test, however, it does not seem to reproduce the bug described by Antoine
In you case, you could do something like:
with pytest.raises(Exception) as error:
pd.testing.assert_series_equal(
tsa.get(formula_name),
tsa.get(formula_name, nocache=True)
)
assert len(tsa.get(formula_name)) == 4
assert len(tsa.get(formula_name, nocache=True)) == 5
assert tsa.get(formula_name, nocache=True).index[0] not in tsa.get(formula_name).index
The context manager allows to catch the error, hence the test does not fail but show the problem. The following assertions specifiy the difference between the two series: one is shorter than the other and the FIRST value is missing.
look_before='(shifted now #:days -2)',
. changed to look_before='(shifted now #:days -4)',
With this policy, your 2 series (with and without cache) would be strictly equals.
With the slice operator (which redefine from_value_date and to_value_date), the cache don't work correctly. For example, if we have a slice with todate=(today), the cache will cut value_date after this todate (which become the revision_date). Here the unit test (seem I lost the right to push something here > 403):