Closed kevintcaron closed 4 months ago
Hi @kevintcaron
With recent versions of samplics, you must use an enum to indicate the population parameter. For a proportion, it becomes PopParam.prop; please look at my edits to your code. There were also issues with samplics caused by missing values with the domain variable.
Please update to the latest version and retry. Let me know if you are still having issues.
import numpy as np
import pandas as pd
from samplics.estimation import TaylorEstimator
from samplics.utils import PopParam, SinglePSUEst
df_2019 = pd.read_sas(r"./issues/number56/LLCP2019.XPT")
smoker_var = "_RFSMOK3"
weight = "_LLCPWT"
domain = "_RACE"
df = df_2019.copy()
# RECODE smoker variable
# Replace 1 with 0
df.loc[df[smoker_var] == 1, smoker_var] = 0
# Replace 2 with 1
df.loc[df[smoker_var] == 2, smoker_var] = 1
# Replace 9 with nan
df[smoker_var] = df[smoker_var].replace(9, np.nan)
stratified_proportion_smoking = TaylorEstimator(PopParam.prop)
stratified_proportion_smoking.estimate(
y=df[smoker_var],
samp_weight=df[weight],
stratum=df["_STSTR"],
psu=df["_PSU"],
domain=df[domain],
single_psu=SinglePSUEst.skip,
remove_nan=True,
)
df_smoking = stratified_proportion_smoking.to_dataframe()
Thank you for the quick response! This solved my issue. Please let me know if there is anything I can do to help to support continued development!
Hi @MamadouSDiallo,
I am attempting to calculate smoking prevalence ('_RFSMOK3') by race ('_RACE') with BRFSS data. This seems to work fine for 2018 BRFSS data, but for 2019 data I am getting the error below. I thought maybe it was a divide by 0 error, but it appears there are plenty of respondents for each of the racial groups. I have also provided my code below the error, but it will require download of the 2019 BRFSS data (or 2018 for it to work). Do you know what is causing this issue?
ERROR:
CODE: