zfit / zfit

Model manipulation and fitting library based on TensorFlow and optimised for simple and direct manipulation of probability density functions. Its main focus is on scalability, parallelisation and user friendly experience.
http://zfit.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
179 stars 51 forks source link

KDE shapes over MultiSpaces are crashing #430

Open schmitse opened 1 year ago

schmitse commented 1 year ago

Creating a kernel density estimation from a dataset using a MultiSpace is currently not working.

Current Behaviour

When creating a KDE from over a MultiSpace the constructor throws an error due to MultiSpaces not having the limits attribute, which is required for the padding of the data:

        data, size, weights = self._convert_init_data_weights_size(
            data, weights, padding=padding, limits=obs.limits
        )

. I attached a minimal (not) working example of the issue:

import zfit
import numpy as np
obs1 = zfit.Space('obs', limits=(0., 0.25))
obs2 = zfit.Space('obs', limits=(0.75, 1.))
obs = obs1 + obs2

gen = np.random.default_rng(seed=1337)
data = gen.exponential(1, size=(1000,))
data_zfit = zfit.Data.from_numpy(obs=obs, array=data)

kde = zfit.pdf.KDE1DimExact(data=data_zfit, obs=obs, name='test')

When running this you get the following error:

Traceback (most recent call last):
  File "/afs/cern.ch/work/s/schmitse/ewp-rkstz/analysis/python/schmitse/mva_plots/python/MWE.py", line 14, in <module>
    kde = zfit.pdf.KDE1DimExact(data=data_zfit, obs=obs, name='test')
  File "/afs/cern.ch/work/s/schmitse/ewp-rkstz/analysis/python/schmitse/zfit/zfit/lib/python3.9/site-packages/zfit/models/kde.py", line 911, in __init__
    data, weights, padding=padding, limits=obs.limits
  File "/afs/cern.ch/work/s/schmitse/ewp-rkstz/analysis/python/schmitse/zfit/zfit/lib/python3.9/site-packages/zfit/core/space.py", line 133, in wrapped_func
    return func(*args, **kwargs)
  File "/afs/cern.ch/work/s/schmitse/ewp-rkstz/analysis/python/schmitse/zfit/zfit/lib/python3.9/site-packages/zfit/core/space.py", line 2727, in limits
    self._raise_limits_not_implemented()
  File "/afs/cern.ch/work/s/schmitse/ewp-rkstz/analysis/python/schmitse/zfit/zfit/lib/python3.9/site-packages/zfit/core/space.py", line 3069, in _raise_limits_not_implemented
    raise MultipleLimitsNotImplemented(
zfit.util.exception.MultipleLimitsNotImplemented: Limits/lower/upper not implemented for MultiSpace. This error is either caught automatically as part of the codes logic or the MultiLimit case should be considered. To do that, simply iterate through the MultiSpace, which returns a simple space. Iterating through a Spaces also worksfor simple spaces.

Context (Environment)

Possible Solution/Implementation

KDEs over MultiSpaces should either raise a NotImplementedError or the padding of the input dataset should be reflected around the edges of the subspaces of the MultiSpace.

A quick fix to make the KDEs work again would be to pass the observable space limits only if the Space has the limits attribute:

        # MultiSpaces have Attribute spaces, if MultiSpace dont pass limits. 
        _limits = None if hasattr(obs, 'spaces') else obs.limits
        data, size, weights = self._convert_init_data_weights_size(
            data, weights, padding=padding, limits=_limits
        )
jonas-eschle commented 5 months ago

Hey @schmitse , coming back to this: instead of using the (overcomplicated) MultiSpace in zfit, the newest developments rather try to get rid of it, but instead, a "TruncatedPDF" can be used. I think the above case could be best solved.

import zfit
import numpy as np

obs1 = zfit.Space('obs', limits=(0., 0.25))
obs2 = zfit.Space('obs', limits=(0.75, 1.))
obs = obs1 + obs2
obsall= zfit.Space('obs', 0, 1)
gen = np.random.default_rng(seed=1337)
data = gen.exponential(1, size=(1000,))
data_zfit = zfit.Data.from_numpy(obs=obsall, array=data)

kde = zfit.pdf.KDE1DimExact(data=data_zfit, obs=obsall, name='test')
kdetrunc = kde.to_truncated([obs1, obs2])

# plot
import matplotlib.pyplot as plt
x = np.linspace(0, 1, num=273)
plt.plot(x, kdetrunc.pdf(x),  label='trunc')
plt.plot(x, kde.pdf(x), "--", label='kde',)
plt.legend()
plt.show()

what do you think? (yes, the normalization changes, but this should not matter for pdf, as we don't care about the absolute norm, for extended, this will by default be adjusted accordingly)