Open steveshaoucsb opened 2 months ago
If you do low_freq_df.resample('MS').first()
(this will put it into monthly starting and fill the missing values with NaNs) then pass that in, does it work?
The program will crash due to the following error:
Cell In[17], [line 72](vscode-notebook-cell:?execution_count=17&line=72)
[70](vscode-notebook-cell:?execution_count=17&line=70) hf_data = data2[['S7_Total']][disagg_start_date:disagg_end_date]
[71](vscode-notebook-cell:?execution_count=17&line=71) lf_data = lf_data.resample('MS').first()
---> [72](vscode-notebook-cell:?execution_count=17&line=72) disagg_result = disaggregate_series(
[73](vscode-notebook-cell:?execution_count=17&line=73) lf_data,
[74](vscode-notebook-cell:?execution_count=17&line=74) hf_data.assign(intercept=1),
[75](vscode-notebook-cell:?execution_count=17&line=75) method="chow-lin",
[76](vscode-notebook-cell:?execution_count=17&line=76) agg_func="first",
[77](vscode-notebook-cell:?execution_count=17&line=77) optimizer_kwargs={"method": "powell"},
[78](vscode-notebook-cell:?execution_count=17&line=78) ).to_frame(name='K')
[79](vscode-notebook-cell:?execution_count=17&line=79) data2.loc[disagg_result.index, 'K'] = disagg_result.values.squeeze()
File /opt/anaconda3/lib/python3.12/site-packages/tsdisagg/ts_disagg.py:335, in disaggregate_series(low_freq_df, high_freq_df, target_freq, target_column, agg_func, method, criterion, h, optimizer_kwargs, verbose, return_optimizer_result)
[332](https://file+.vscode-resource.vscode-cdn.net/opt/anaconda3/lib/python3.12/site-packages/tsdisagg/ts_disagg.py:332) target_column = target_column or low_freq_df.columns[0]
[333](https://file+.vscode-resource.vscode-cdn.net/opt/anaconda3/lib/python3.12/site-packages/tsdisagg/ts_disagg.py:333) target_idx = np.flatnonzero(low_freq_df.columns == target_column)[0]
--> [335](https://file+.vscode-resource.vscode-cdn.net/opt/anaconda3/lib/python3.12/site-packages/tsdisagg/ts_disagg.py:335) df, C_mask, time_conversion_factor = prepare_input_dataframes(
[336](https://file+.vscode-resource.vscode-cdn.net/opt/anaconda3/lib/python3.12/site-packages/tsdisagg/ts_disagg.py:336) low_freq_df, high_freq_df, target_freq, method
[337](https://file+.vscode-resource.vscode-cdn.net/opt/anaconda3/lib/python3.12/site-packages/tsdisagg/ts_disagg.py:337) )
[339](https://file+.vscode-resource.vscode-cdn.net/opt/anaconda3/lib/python3.12/site-packages/tsdisagg/ts_disagg.py:339) y = df.iloc[:, target_idx].dropna().values
[340](https://file+.vscode-resource.vscode-cdn.net/opt/anaconda3/lib/python3.12/site-packages/tsdisagg/ts_disagg.py:340) X = df.drop(columns=df.columns[target_idx]).values
File /opt/anaconda3/lib/python3.12/site-packages/tsdisagg/ts_disagg.py:163, in prepare_input_dataframes(df1, df2, target_freq, method)
[158](https://file+.vscode-resource.vscode-cdn.net/opt/anaconda3/lib/python3.12/site-packages/tsdisagg/ts_disagg.py:158) raise ValueError(
...
--> [163](https://file+.vscode-resource.vscode-cdn.net/opt/anaconda3/lib/python3.12/site-packages/tsdisagg/ts_disagg.py:163) raise ValueError("low_freq_df has missing values.")
[165](https://file+.vscode-resource.vscode-cdn.net/opt/anaconda3/lib/python3.12/site-packages/tsdisagg/ts_disagg.py:165) if df2 is not None:
[166](https://file+.vscode-resource.vscode-cdn.net/opt/anaconda3/lib/python3.12/site-packages/tsdisagg/ts_disagg.py:166) if not isinstance(df2.index, pd.core.indexes.datetimes.DatetimeIndex):
ValueError: low_freq_df has missing values.
Ok great. Let me look into what the R program is doing in the case. My assumption is that they mask out the missing values, fit the data, then fill in the missing value with predictions from the fit model, but I'll need to check.
This might work in 1.3, but I don't specifically have a test for it. Have a test and let me know.
Just tested. It didn't got fixed. If I do lf_data = lf_data.resample('MS').first()
, the program will tell me there is missing value and cannot proceed. If I remove the missing value, this error will occur: ValueError: Low frequency dataframe does not have a valid time index with frequency information
Ok thanks for checking. I think I know how to handle it.
Any updates, or do you need any more explanation of the issue?
In the case when both high-frequency data and low-frequency data are on the same frequency, but I want to use chow-lin to backcast to some earlier date when there is less data available by leveraging the later data to get a more precise estimate of Beta in chow-lin(For example, high-frequency data has monthly data available from 1987-12-01 to 2021-12-01; low-frequency data has monthly data available on 1987-12-01, and from 2001-12-01 to 2021-12-01(inclusive), but has no data available between 1988-01-01 to 2001-11-01(inclusive), the package will not work because the inferred_freq will return None. In "tempdisagg" package in R, this case works!