tjefferies / pymetalog

Public repo for the pymetalog project
MIT License
38 stars 3 forks source link

Error when attempting to fit distributions with >100 points #19

Open bbergerud opened 2 years ago

bbergerud commented 2 years ago

When attempting to fit a distribution with more than 100 points an error is raised:

import numpy as np
import pymetalog as pm

# Works Fine
x = np.random.randn(100)
m = pm.metalog(x)

# Error
x = np.random.randn(101)
m = pm.metalog(x)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In [1], line 10
      8 # Error
      9 x = np.random.randn(101)
---> 10 m = pm.metalog(x)

File ~/miniconda3/envs/dates/lib/python3.9/site-packages/pymetalog/metalog.py:255, in metalog.__init__(self, x, bounds, boundedness, term_limit, term_lower_bound, step_len, probs, fit_method, penalty, alpha)
    251             Y[yn] = Y['y2'] * Y[zn]
    253 output_dict['Y'] = Y
--> 255 self.output_dict = a_vector_OLS_and_LP(
    256     output_dict,
    257     bounds = self.bounds,
    258     boundedness = self.boundedness,
    259     term_limit = self.term_limit,
    260     term_lower_bound = self.term_lower_bound,
    261     fit_method = self.fit_method,
    262     alpha = self.alpha,
    263     diff_error = .001,
    264     diff_step = 0.001)

File ~/miniconda3/envs/dates/lib/python3.9/site-packages/pymetalog/a_vector.py:215, in a_vector_OLS_and_LP(m_dict, bounds, boundedness, term_limit, term_lower_bound, fit_method, alpha, diff_error, diff_step)
    213 Est = np.dot(m_dict['Y'], A)
    214 ncols = A.shape[1]
--> 215 Z = np.column_stack((np.array(m_dict['dataValues']['z']),np.repeat(m_dict['dataValues']['z'].values,ncols-1).reshape(len(m_dict['dataValues']['z']),ncols-1)))
    217 m_dict['square_residual_error'] = ((Z-Est)**2).sum(axis=1)
    219 return m_dict

AttributeError: 'numpy.ndarray' object has no attribute 'values'

Changing m_dict['dataValues']['z'].values to m_dict['dataValues']['z'] will fix the problem when there are more than 100 samples but create issues when there are less, so an If-Else statement would probably suffice to fix the problem. Trying to cast x into a Pandas DataFrame or series when using more than 100 samples results in the same error.

peterhurford commented 1 year ago

+1

aleewen commented 1 year ago

I changed this line to an if-else statement like you mentioned: if len(z) <= 100: Z = np.column_stack((np.array(m_dict['dataValues']['z']),np.repeat(m_dict['dataValues']['z'].values,ncols-1).reshape(len(m_dict['dataValues']['z']),ncols-1))) else: Z = np.column_stack((np.array(m_dict['dataValues']['z']),np.repeat(m_dict['dataValues']['z'],ncols-1).reshape(len(m_dict['dataValues']['z']),ncols-1)))

However, the plot looks incorrect with samples of 99 but correct with samples of 101.