Conformal Tights is a Python package that exports:
pip install conformal-tights
Conformal Tights exports a meta-estimator called ConformalCoherentQuantileRegressor
that you can use to equip any scikit-learn regressor with a predict_quantiles
method that predicts conformally calibrated quantiles. Example usage:
from conformal_tights import ConformalCoherentQuantileRegressor
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
# Fetch dataset and split in train and test
X, y = fetch_openml("ames_housing", version=1, return_X_y=True, as_frame=True, parser="auto")
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=42)
# Create a regressor, equip it with conformal prediction, and fit on the train set
my_regressor = XGBRegressor(objective="reg:absoluteerror")
conformal_predictor = ConformalCoherentQuantileRegressor(estimator=my_regressor)
conformal_predictor.fit(X_train, y_train)
# Predict with the underlying regressor
ŷ_test = conformal_predictor.predict(X_test)
# Predict quantiles with the conformal predictor
ŷ_test_quantiles = conformal_predictor.predict_quantiles(
X_test, quantiles=(0.025, 0.05, 0.1, 0.5, 0.9, 0.95, 0.975)
)
When the input data is a pandas DataFrame, the output is also a pandas DataFrame. For example, printing the head of ŷ_test_quantiles
yields:
house_id | 0.025 | 0.05 | 0.1 | 0.5 | 0.9 | 0.95 | 0.975 |
---|---|---|---|---|---|---|---|
1357 | 114743.7 | 120917.9 | 131752.6 | 156708.2 | 175907.8 | 187996.1 | 205443.4 |
2367 | 67382.7 | 80191.7 | 86871.8 | 105807.1 | 118465.3 | 127581.2 | 142419.1 |
2822 | 119068.0 | 131864.8 | 138541.6 | 159447.7 | 179227.2 | 197337.0 | 214134.1 |
2126 | 93885.8 | 100040.7 | 111345.5 | 134292.7 | 150557.1 | 164595.8 | 182524.1 |
1544 | 68959.8 | 81648.8 | 88364.1 | 108298.3 | 122329.6 | 132421.1 | 147225.6 |
Let's visualize the predicted quantiles on the test set:
In addition to quantile prediction, you can use predict_interval
to predict conformally calibrated prediction intervals. Compared to quantiles, these focus on reliable coverage over quantile accuracy. Example usage:
# Predict an interval for each example with the conformal predictor
ŷ_test_interval = conformal_predictor.predict_interval(X_test, coverage=0.95)
# Measure the coverage of the prediction intervals on the test set
coverage = ((ŷ_test_interval.iloc[:, 0] <= y_test) & (y_test <= ŷ_test_interval.iloc[:, 1])).mean()
print(coverage) # 96.6%
When the input data is a pandas DataFrame, the output is also a pandas DataFrame. For example, printing the head of ŷ_test_interval
yields:
house_id | 0.025 | 0.975 |
---|---|---|
1357 | 107202.8 | 206290.4 |
2367 | 66665.1 | 146004.8 |
2822 | 115591.8 | 220314.8 |
2126 | 85288.1 | 183037.8 |
1544 | 67889.9 | 150646.2 |
Conformal Tights also exports a Darts forecaster called DartsForecaster
that uses a ConformalCoherentQuantileRegressor
to make conformally calibrated probabilistic time series forecasts. To demonstrate its usage, let's begin by loading a time series dataset:
from darts.datasets import ElectricityConsumptionZurichDataset
# Load a forecasting dataset
ts = ElectricityConsumptionZurichDataset().load()
ts = ts.resample("h")
# Split the dataset into covariates X and target y
X = ts.drop_columns(["Value_NE5", "Value_NE7"])
y = ts["Value_NE5"] # NE5 = Household energy consumption
# Add categorical covariates to X
X = X.add_holidays(country_code="CH")
X = X.add_datetime_attribute("month")
X = X.add_datetime_attribute("dayofweek")
X = X.add_datetime_attribute("hour")
X_categoricals = ["holidays", "month", "dayofweek", "hour"]
Printing the tail of the covariates time series X.pd_dataframe()
yields:
Timestamp | Hr [%Hr] | RainDur [min] | StrGlo [W/m2] | T [°C] | WD [°] | WVs [m/s] | WVv [m/s] | p [hPa] | holidays | month | dayofweek | hour |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2022‑08‑30 20h | 70.2 | 0.0 | 0.0 | 19.9 | 290.2 | 1.7 | 1.5 | 968.5 | 0.0 | 7.0 | 1.0 | 20.0 |
2022‑08‑30 21h | 70.1 | 0.0 | 0.0 | 19.5 | 239.2 | 1.0 | 0.7 | 968.1 | 0.0 | 7.0 | 1.0 | 21.0 |
2022‑08‑30 22h | 71.3 | 0.0 | 0.0 | 19.5 | 28.9 | 1.5 | 1.3 | 967.9 | 0.0 | 7.0 | 1.0 | 22.0 |
2022‑08‑30 23h | 80.4 | 0.0 | 0.0 | 18.9 | 24.3 | 1.6 | 1.1 | 967.9 | 0.0 | 7.0 | 1.0 | 23.0 |
2022‑08‑31 00h | 81.6 | 1.0 | 0.0 | 18.7 | 293.5 | 0.9 | 0.3 | 967.8 | 0.0 | 7.0 | 2.0 | 0.0 |
We can now equip a scikit-learn regressor with conformal prediction using ConformalCoherentQuantileRegressor
as before, and then equip that conformal predictor with probabilistic time series forecasting using DartsForecaster
:
from conformal_tights import DartsForecaster, ConformalCoherentQuantileRegressor
from pandas import Timestamp
from xgboost import XGBRegressor
# Split the dataset into train and test
test_cutoff = Timestamp("2022-06-01")
y_train, y_test = y.split_after(test_cutoff)
X_train, X_test = X.split_after(test_cutoff)
# Now let's:
# 1. Create an sklearn regressor of our choosing, in this case `XGBRegressor`
# 2. Add conformal quantile prediction to the regressor with `ConformalCoherentQuantileRegressor`
# 3. Add probabilistic forecasting to the conformal predictor with `DartsForecaster`
my_regressor = XGBRegressor()
conformal_predictor = ConformalCoherentQuantileRegressor(estimator=my_regressor)
forecaster = DartsForecaster(
model=conformal_predictor,
lags=5 * 24, # Add the last 5 days of the target to the prediction features
lags_future_covariates=[0], # Add the current timestamp's covariates to the prediction features
categorical_future_covariates=X_categoricals, # Convert these covariates to pd.Categorical
)
# Fit the forecaster
forecaster.fit(y_train, future_covariates=X_train)
# Make a probabilistic forecast 5 days into the future by predicting a set of conformally calibrated
# quantiles at each time step and drawing 500 samples from them
quantiles = (0.025, 0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95, 0.975)
forecast = forecaster.predict(
n=5 * 24, future_covariates=X_test, num_samples=500, quantiles=quantiles
)
Printing the head of the forecast quantiles time series forecast.quantiles_df(quantiles=quantiles)
yields:
Timestamp | Value_NE5_0.025 | Value_NE5_0.05 | Value_NE5_0.1 | Value_NE5_0.25 | Value_NE5_0.5 | Value_NE5_0.75 | Value_NE5_0.9 | Value_NE5_0.95 | Value_NE5_0.975 |
---|---|---|---|---|---|---|---|---|---|
2022‑06‑01 01h | 19165.2 | 19268.3 | 19435.7 | 19663.0 | 19861.7 | 20062.2 | 20237.9 | 20337.7 | 20453.2 |
2022‑06‑01 02h | 19004.0 | 19099.0 | 19226.3 | 19453.7 | 19710.7 | 19966.1 | 20170.1 | 20272.8 | 20366.9 |
2022‑06‑01 03h | 19372.6 | 19493.0 | 19679.4 | 20027.6 | 20324.6 | 20546.3 | 20773.2 | 20910.3 | 21014.1 |
2022‑06‑01 04h | 21936.2 | 22105.6 | 22436.0 | 22917.5 | 23308.6 | 23604.8 | 23871.0 | 24121.7 | 24351.5 |
2022‑06‑01 05h | 25040.5 | 25330.5 | 25531.1 | 25910.4 | 26439.4 | 26903.2 | 27287.4 | 27493.9 | 27633.9 |
Let's visualize the forecast and its prediction interval on the test set: