rte-france / relife

ReLife is an open source Python library for asset management based on reliability theory and lifetime data analysis.
Apache License 2.0
17 stars 2 forks source link

Issue with _jac_sf parameters #16

Closed Nathan-Herzhaft closed 3 weeks ago

Nathan-Herzhaft commented 1 month ago

Hi, I think there's a problem with the definition of some methods when using the ProportionalHazards class . When using _jac_sf the arguments are supposed to be : self, params: np.ndarray, t: np.ndarray, *args: np.ndarray. However this method relies on the _jac_chf method with arguments : self, params: np.ndarray, t: np.ndarray , covar*: np.ndarray,args**: np.ndarray. This creates an error when calling the function because the parameters are not interpreted correctly :

`from relife import * import numpy as np

n = 100 n_features = 5

Covar = np.random.rand(n,n_features)

time = np.random.randint(1,50,n) event = np.random.binomial(1,0.2,n) entry = np.random.randint(0,time)

estimator = ProportionalHazards(baseline=Weibull()) estimator.fit(time,event,entry,Covar)

t = np.arange(1,15)

print(estimator._jac_sf(estimator.params,t,(Covar,)))`

returns the error : ValueError: operands could not be broadcast together with shapes (1,100,5) (14,) due to misinterpretation of the arguments

21ch216 commented 4 weeks ago

Thank you for your interest in ReLife. Before I address your query, let me provide some context about the ReLife project. The version of ReLife you’re currently using is the initial working version. We are aware that some functionalities, explanations, and code choices are not well-documented, which can make the user experience less intuitive and limit contribution possibilities.

I am currently working on a major refactoring of the code. We plan to release a new version at the end of November, which will include cleaner documentation and better tutorials. The code will also adopt a more structured OOP style while adhering to typing theory as much as possible. If you plan to use ReLife extensively, we would greatly appreciate your feedback on user experience and functionality requests.

End of context. My answer:

First, let me provide further details regarding your error in the _jac_chf method:

Therefore, numpy cannot execute self._jac_g(beta, covar) * self.baseline._chf(params0, t, *args).

First error: The _jac_sf signature is params: np.ndarray, t: np.ndarray, *args: np.ndarray. It seems you misunderstood the usage of the * unpacking operator. If you write estimator._jac_sf(estimator.params, t, (Covar,)), then args will be ((Covar,)), a tuple of tuples. The *args already converts args to a tuple, so you don’t need to encapsulate Covar in a tuple. Simply write estimator._jac_sf(estimator.params, t, Covar). That’s sufficient.

Now, _jac_g(beta, covar) has a shape of (100, 5), which is better as we have derivative values of g evaluated on 100 points (second dimension is 5 because of 5 parameters). If you prefer writing (Covar,), you must unpack your tuple before like this: estimator._jac_sf(estimator.params, t, *(Covar,)). Personally, I’ve never seen this kind of usage, so I wouldn’t recommend adopting this “code style.”

Second error: t has a shape of (14,). Here, a shape of (14,) would mean you want to evaluate _jac_g on 14 measurement points (14 observations). However, your Covar has a shape of (100, 5), meaning you defined covariate values for 100 observations. There is an inconsistency in how you construct t regarding Covar. It must contain the same number of lifetime observations as the 100 covariate vectors you defined. The following code works:

>>> t = np.arange(1, 101).reshape(-1, 1)  # now t has shape (100, 1), meaning one time value for 100 observations
>>> jac_sf = estimator._jac_sf(estimator.params, t, Covar)  # jac_sf has shape (100, 7), 100 gradient vectors

This information may not be clearly stated in the documentation. I’ll address this in the future version of ReLife.

One more detail: you may have noticed that _jac_g is a “hidden” method. It was implemented solely for optimizing the likelihood, so a user is not expected to request jac_g on a model object. In your case, you need it, and I’ll address that too. Arrays in “jac-like” operations generally have 2 dimensions, and their implementation depends on a data reshape in the fit process. That’s why I reshaped t as a 2-dimensional array above (see: ReLife data.py for more details).

Does that code snippet solve your problem?

Otherwise, if you aim to evaluate jac_sf on 14 points with one set of covariate values, you should write:

>>> t = np.arange(1, 15).reshape(-1, 1)
>>> new_covar = np.random.rand(1, n_features)
>>> jac_sf = estimator._jac_sf(estimator.params, t, new_covar)  # jac_sf has shape (14, 7), 14 gradient vectors

Let me know if you need further explanations! 😊

Nathan-Herzhaft commented 3 weeks ago

Thank you for your help, it solves the problem !