mggg / ecological-inference

Ecological inference, in Python
MIT License
28 stars 11 forks source link

The Goodman Regression worked but not the EI #28

Closed bblpo closed 3 years ago

bblpo commented 3 years ago

I was able to download your zipped folder and installed pyei on my local site-packages.

I then started to experiment with your own example data and code, as well as my own more simpler data. I was able to generate output when the Goodman regression function was called (which suggested that my installations and imports went successfully). But when I used ei functions (both King's original and your parameter set up about lambda), it took forever, and never generated output. Eventually, my application just died out and sent error message because the response time was too long.

Any idea why that happened. (BTW, I was able to install all the required packages successfully and imported without any problem).

karink520 commented 3 years ago

Hi! Sorry I missed this message before. So glad you're trying out PyEI! We're actively developing it right now so things are a bit rough.

That's an interesting problem you're running into. Getting no output like on the EI setups that use MCMC sampling (which Goodman's regression does not) suggests to me that the issue you're having might be with the pymc library that PyEI uses for that sampling, rather than PyEI itself.

To get a sense if that's true, you could try running a very short piece of code using pymc, e.g.:

import pymc3 as pm
import numpy as np

with pm.Model() as model:
    mu = pm.Normal("mu", mu=0, sigma=1)
    obs = pm.Normal("obs", mu=mu, sigma=1, observed=np.random.randn(100))

    idata = pm.sample(2000, tune=1500, return_inferencedata=True)

print(idata.posterior["mu"].shape)

and see if that works. If you see the same sort of problem with just that code snippet that you saw with pyei, that would suggest trying to dig deeper to find out what's going wrong with pymc.

bblpo commented 3 years ago

Hi,

Thanks for your response.

I am glad to report some progress in using your PyEI.

I was able to run the “king99_pareto_modification” model with pareto_scale set at 8 and pareto_shape at 2. However, the results were very inconsistent in different operations on the same data.

See attached for the data. The first result was as follows:

Model: king99_pareto_modification Computed from the raw b_i samples by multiplying by population and then getting the proportion of the total pop (total pop=summed across all districts): The posterior mean for the district-level voting preference of the minority for Obama_black is 0.430 The posterior mean for the district-level voting preference of non-the minority for Obama_black is 0.495 95% Bayesian credible interval for district-level voting preference of the minority for Obama_black is [0.09978816 0.80687688] 95% Bayesian credible interval for district-level voting preference of non-the minority for Obama_black is [0.18062089 0.91671262]

The second result was:

Model: king99_pareto_modification Computed from the raw b_i samples by multiplying by population and then getting the proportion of the total pop (total pop=summed across all districts): The posterior mean for the district-level voting preference of the minority for Obama_black is 0.472 The posterior mean for the district-level voting preference of non-the minority for Obama_black is 0.705 95% Bayesian credible interval for district-level voting preference of the minority for Obama_black is [0.06649013 0.90340257] 95% Bayesian credible interval for district-level voting preference of non-the minority for Obama_black is [0.24385585 0.98742981]

Both of these results were way off from the EI result if we run the same data on R.

Any insight into the inconsistency will be highly appreciated.

BTW. Your “king99” and “wakefield_beta” models did not generate results at all if we run the attached dataset.

Baodong

From: karink520 @.> Reply-To: mggg/ecological-inference @.> Date: Thursday, February 18, 2021 at 2:15 PM To: mggg/ecological-inference @.> Cc: BAODONG LIU @.>, Author @.***> Subject: Re: [mggg/ecological-inference] The Goodman Regression worked but not the EI (#28)

Hi! Sorry I missed this message before. So glad you're trying out PyEI! We're actively developing it right now so things are a bit rough.

That's an interesting problem you're running into. Getting no output like on the EI setups that use MCMC sampling (which Goodman's regression does not) suggests to me that the issue you're having might be with the pymc library that PyEI uses for that sampling, rather than PyEI itself.

To get a sense if that's true, you could try running a very short piece of code using pymc, e.g.:

import pymc3 as pm

import numpy as np

with pm.Model() as model:

mu = pm.Normal("mu", mu=0, sigma=1)

obs = pm.Normal("obs", mu=mu, sigma=1, observed=np.random.randn(100))

idata = pm.sample(2000, tune=1500, return_inferencedata=True)

print(idata.posterior["mu"].shape)

and see if that works. If you see the same sort of problem with just that code snippet that you saw with pyei, that would suggest trying to dig deeper to find out what's going wrong with pymc.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/mggg/ecological-inference/issues/28#issuecomment-781639216, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AISSRIBH6TK5Z7YC7COPXLLS7V7OXANCNFSM4VCTDCBQ.

bblpo commented 3 years ago

Thanks again. I have found the solution to the inconsistency problem. The speed is, however, very slow to run the pareto-modification model.

From: BAODONG LIU @.> Date: Sunday, April 18, 2021 at 3:53 PM To: mggg/ecological-inference @.> Subject: Re: [mggg/ecological-inference] The Goodman Regression worked but not the EI (#28)

Hi,

Thanks for your response.

I am glad to report some progress in using your PyEI.

I was able to run the “king99_pareto_modification” model with pareto_scale set at 8 and pareto_shape at 2. However, the results were very inconsistent in different operations on the same data.

See attached for the data. The first result was as follows:

Model: king99_pareto_modification Computed from the raw b_i samples by multiplying by population and then getting the proportion of the total pop (total pop=summed across all districts): The posterior mean for the district-level voting preference of the minority for Obama_black is 0.430 The posterior mean for the district-level voting preference of non-the minority for Obama_black is 0.495 95% Bayesian credible interval for district-level voting preference of the minority for Obama_black is [0.09978816 0.80687688] 95% Bayesian credible interval for district-level voting preference of non-the minority for Obama_black is [0.18062089 0.91671262]

The second result was:

Model: king99_pareto_modification Computed from the raw b_i samples by multiplying by population and then getting the proportion of the total pop (total pop=summed across all districts): The posterior mean for the district-level voting preference of the minority for Obama_black is 0.472 The posterior mean for the district-level voting preference of non-the minority for Obama_black is 0.705 95% Bayesian credible interval for district-level voting preference of the minority for Obama_black is [0.06649013 0.90340257] 95% Bayesian credible interval for district-level voting preference of non-the minority for Obama_black is [0.24385585 0.98742981]

Both of these results were way off from the EI result if we run the same data on R.

Any insight into the inconsistency will be highly appreciated.

BTW. Your “king99” and “wakefield_beta” models did not generate results at all if we run the attached dataset.

Baodong

From: karink520 @.> Reply-To: mggg/ecological-inference @.> Date: Thursday, February 18, 2021 at 2:15 PM To: mggg/ecological-inference @.> Cc: BAODONG LIU @.>, Author @.***> Subject: Re: [mggg/ecological-inference] The Goodman Regression worked but not the EI (#28)

Hi! Sorry I missed this message before. So glad you're trying out PyEI! We're actively developing it right now so things are a bit rough.

That's an interesting problem you're running into. Getting no output like on the EI setups that use MCMC sampling (which Goodman's regression does not) suggests to me that the issue you're having might be with the pymc library that PyEI uses for that sampling, rather than PyEI itself.

To get a sense if that's true, you could try running a very short piece of code using pymc, e.g.:

import pymc3 as pm

import numpy as np

with pm.Model() as model:

mu = pm.Normal("mu", mu=0, sigma=1)

obs = pm.Normal("obs", mu=mu, sigma=1, observed=np.random.randn(100))

idata = pm.sample(2000, tune=1500, return_inferencedata=True)

print(idata.posterior["mu"].shape)

and see if that works. If you see the same sort of problem with just that code snippet that you saw with pyei, that would suggest trying to dig deeper to find out what's going wrong with pymc.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/mggg/ecological-inference/issues/28#issuecomment-781639216, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AISSRIBH6TK5Z7YC7COPXLLS7V7OXANCNFSM4VCTDCBQ.