xKDR / CRRao.jl

MIT License
35 stars 22 forks source link

`@fitmodel` errors on sanction dataset with seed 123. #3

Open codetalker7 opened 2 years ago

codetalker7 commented 2 years ago

Consider the following code.

using RDatasets, CRRao

sanction = dataset("Zelig", "sanction")
CRRao_seed(123)

model = @fitmodel((Num ~ Target + Coop + NCost), sanction, NegBinomRegression(), Prior_Cauchy())

The above code returns an error (the error message is too long to print here).

Running the same code with a different seed (42) runs successfully, but with a warning (the image of the warning is attached). warning

sourish-cmi commented 2 years ago

This warning is not due to seed choice. This warning is due to the MCMC chain exploring some possible values and getting some illegal values at the initial stage. This warning is coming from Turing. Can't tell why it stops due to seed 123. Can you please share the error message from seed 123?

ayushpatnaikgit commented 2 years ago
NegativeBinomial: the condition zero(p) < p <= one(p) is not satisfied.

This is the error when we use seed = 123

sourish-cmi commented 2 years ago

This is very odd. This issue is supposed to be mathematically taken care of in the modeling. What is the probability that 1 is turning out as 1.000000002 ! Do you think it is possible?

ayushpatnaikgit commented 2 years ago

We should file this issue in Turing.jl

sourish-cmi commented 2 years ago

Let me prepare a note with code easy to understand for the Turing developer.

ayushpatnaikgit commented 2 years ago

This is the underlying code:

using Turing, Random, RDatasets, StatsModels
Random.seed!(123)
data = dataset("Zelig", "sanction")
formula = @formula(Num ~ Target + Coop + NCost)
formula = apply_schema(formula, schema(formula, data));
y, X = modelcols(formula, data);
@model NegBinomReg(X, y) = begin
    p = size(X, 2);
    n = size(X, 1);
    #priors
    λ~InverseGamma(1.0,1.0)
    ν~InverseGamma(1.0,1.0)
    α ~ TDist(ν)*λ
    β ~ filldist(TDist(ν)*λ, p)  

    ## link
    z = α .+ X * β
    mu = exp.(z)

    #likelihood
    for i = 1:n
        y[i] ~ NegativeBinomial(λ, (1 / (1 + mu[i]/ λ)))
    end
end
NegBinomReg_model=NegBinomReg(X,y);
chain = sample(NegBinomReg_model, NUTS(), 10000)
summaries, quantiles = describe(chain)

@sourish-cmi it is running perfectly. We aren't able to isolate the bug to report it on Turing.

ayushpatnaikgit commented 2 years ago

@sourish-cmi can you make a small code reproducible code for this for the Turing folks.

sourish-cmi commented 2 years ago

@ayushpatnaikgit @codetalker7

Ran the code successfully on my end. I believe the bug is fixed now. Shall we close this issue?

sourish-cmi commented 2 years ago

@ayushpatnaikgit @codetalker7

https://github.com/xKDR/CRRao.jl/issues/3#issuecomment-1239568864

Shall we close this issue? You thought?

codetalker7 commented 2 years ago

Hello @sourish-cmi. I will try to run this code again, by generating a few hundred random seeds (this was how I discovered this error in the first place). I believe this error still exists, and I'll let you know once I run the tests.

In fact, I think the error mentioned in https://github.com/xKDR/CRRao.jl/pull/13 is also similar, and can be discovered like this. I'll try this out, and comment here.

sourish-cmi commented 2 years ago

@codetalker7 okay great idea.

codetalker7 commented 2 years ago

Hi @sourish-cmi. I ran the following test for different priors for linear regression.

using CRRao, Test, StableRNGs, Logging, RDatasets, Random
Logging.disable_logging(Logging.Warn)
CRRao.setprogress!(false)

mtcars = dataset("datasets", "mtcars")

priors = [ 
    Prior_Ridge(),
    Prior_Laplace(),
    Prior_Cauchy(),
    Prior_TDist(),
    Prior_Uniform(),
]

for prior in priors
    for i in 1:1000
        seed = rand(1:1000)

        try 
            CRRao.set_rng(StableRNG(seed))
            model = @fitmodel(MPG ~ HP + WT + Gear, mtcars, LinearRegression(), prior)
        catch e
            println(e)
            println(seed)
            break
        end 
    end 
end

For Prior_TDist, I get that the random seed 707 throws an error, saying that condition \sigma > zero(\sigma) is undefined. In particular, the following code throws an error.

using CRRao, StableRNGs, RDatasets

mtcars = dataset("datasets", "mtcars")
CRRao.set_rng(StableRNG(707))
model = @fitmodel(MPG ~ HP + WT + Gear, mtcars, LinearRegression(), Prior_TDist())

I'm sure that for other regression models as well, we can recover similar erroneous cases. The only problem is that these tests take a long time to run.

codetalker7 commented 2 years ago

I tried running the same test for NegBinomRegression as well, and I got errors on all the priors. The corresponding random seeds I got are the following.

  1. Prior_Ridge: 682
  2. Prior_Laplace: 456
  3. Prior_Cauchy: 839
  4. Prior_TDist: 764
  5. Prior_Uniform: 516

I think either this is an issue in Turing.jl, or some problem with the model definitions (or could be something else too).

sourish-cmi commented 2 years ago

Okay, nice work. If it is a problem with Turing, I have to reproduce the error in Turing. Let me do this, and then we have to file an issue with them.

sourish-cmi commented 2 years ago

On a different thought - it might be the issue with the parameter h - we may have to choose a meaningful choice of h. Let me test on my side - with these seeds - with default and a different choice of h