Closed steveshaoucsb closed 2 months ago
Further note: if I bring the data one more quarter earlier, say 1994/12/1, the error becomes this:
IndexError: boolean index did not match indexed array along dimension 0; dimension is 29 but corresponding boolean dimension is 27
Sounds like that logic is broken. If you want to open a PR that just adds your test (that doesn't pass), that'd be a good first step for fixing it. Otherwise I'll try my best to get to it by the end of the week
The data that I used to reproduced the error has uploaded into the latest pull request!
Thanks! I started looking at it over the weekend. I think the problem happens before the C
matrix. I'm doing something quite dumb with this C_mask
business, so I need to rethink how to check if there are too many/too few observations.
In your case, the time series aren't "aligned" -- there are too many high-frequency observations before the first low frequency observation. So we need some logic to work out how to align them. For yearly data I check that the set of years is equal in the high and low frequency data, but that doesn't work quarterly, since there's only 4 quarters. Maybe check the set of year-quarter?
I'm a bit swamped at work this week but I'll do my best to have a look. If you feel inspired, feel free as well. I think the problem function is actually handle_endpoint_differences
For the point that you mentioned regarding some more observations available earlier than the starting date of the low-frequency data, when I did the same task for R, R can do the interpolation properly, and interpolate the low-frequency data back to the starting point of the high-frequency data. The Python package that you wrote actually works for annual-monthly cases with monthly high-freq data starting earlier than the annual one. So I think this problem is more about how to make backcasting of the low-freq data work to the earlier date when high-freq data provides data for the earlier date in the quarter-monthly case.
For annual-quarter cases, I don't think I have encountered issues over that so far but I will have a try. I am working with a huge amount of data that needs to be interpolated with your package this week. I will pay attention to that case as well. If I have time, I will create a separate case to test that and see whether it works!
Tested with annual-to-quarter case today. The backcasting is working for this case.
Nice! I'm glad something is working. Would you be willing to make a PR with the rest you ran so it can be included in the testing suite? It will be useful if we start tinkering with the Q->M case, to make sure we don't break what currently works.
The year-to-quarter result and test cases has added, and it's available in the pull request that I opened a few days ago
I've been sick during the weekend so didn't take a look, but any updates on how to fix this issue?
Hey, thanks for the poke (and the PR!)
I have been overloaded with stuff the last week or so, hoping to get to this soon
We are aiming to make some public release of some economic data next week(hope to share it with you once it got published), and it will be highly appreciated if you can publish the fix to both the issue I raised as early as you can, as your package plays a vital role in interpolating our dataset. I understand that you might have packed schedule so take your time on fixing these bugs. I am currently overwhelmed by the rest of the work on the dataset and hopefully I can wrap them up early and work on the PR for the fix!
I'll put a couple hours into this today, hopefully I can make some headway
I cut a new release that I think fixes this and maybe also #9. Can you do pip install tsdisagg --update
and open a new issue if you hit more errors.
Due to the issue with the calculation formula in the line 22 of ts_disagg.py,
excess = n - i_len * nl
the program will always generate an extra period in my test scenario, leading the program to crash In my test case, low-frequency quarterly data starts on 1995/6/1, and ends on 2001/12/1; high-frequency monthly data starts on 1995/3/1, and ends on 2001/12/1. The program will show up the following error:IndexError: boolean index did not match indexed array along dimension 0; dimension is 28 but corresponding boolean dimension is 27