check that weights don't depend on `output_type_id` in settings where that would not lead to a valid ensemble

elray1 commented 10 months ago

Mainly, I think this is a check we need when the output type is pmf or cdf. In that case, allowing the weights to depend on the output_type_id could lead to an invalid predictive distribution. We could implement this check in hubEnsembles::simple_ensemble, and I think that would be good enough since hubEnsembles::linear_pool calls simple_ensemble.

I initially thought that we should also not allow weights to depend on the sample index if the output type is sample, but I don't think there's necessarily anything wrong with a per-sample weighting, e.g. if the hub or modeler has some extra information about how the different samples are generated and wants to weight them based on that that factor.

elray1 commented 10 months ago

We might want to allow users to manually override whether this check is done. For example, if you're careful about your weighting scheme this could be OK. And we might want to be able to do this for a trimmed linear pool.

lshandross commented 2 months ago

I ended up deciding to implement the check in the hubEnsembles::validate_ensemble_inputs function since both simple_ensemble and linear_pool call that function before performing any ensemble calculations.

hubverse-org / hubEnsembles

check that weights don't depend on `output_type_id` in settings where that would not lead to a valid ensemble #35