sdfordham / pysyncon

A python module for the synthetic control method
MIT License
34 stars 6 forks source link

 seed #66

Closed MJ-Hon closed 1 week ago

MJ-Hon commented 1 month ago

Hello

I am using your package for analysis and it seems to be very good. Thank you. However, I have a question. There are differences in the analysis results each time I rerun the code in Visual Studio Code (VSC). Is there a way to specify a seed?

Much appreciated!

sdfordham commented 1 month ago

Can you post your code? This should not be possible as the optimization methods available do not have a random aspect to them.

MJ-Hon commented 1 month ago

I think so too, I share the code.

thank you for your time.

def create_dataprep_and_run_analysis(treatmentplace, donorgroup, target, treatment_identifier, condition):
    ev = SCM[(SCM['treatmentplace'] == treatmentplace) | (SCM['donorgroup'] == donorgroup)]
    dataprep = Dataprep(
        foo=ev,
        predictors=[
            target, "C1", "C2", "C3", "C4", "C5",
            "C6", "C7", "C8", "C9", "C10"
        ],
        predictors_op="mean",
        time_predictors_prior=range(2008, 2012),
        dependent=target,
        unit_variable="region",
        time_variable="PRD_DE",
        treatment_identifier=treatment_identifier,
        controls_identifier=list(set(SCM.loc[(SCM['donorgroup'] == donorgroup) & (SCM['region'].str.endswith(condition)), 'region'].tolist())),
        time_optimize_ssr=range(2008, 2012),
    )

    print(dataprep)

    synth = Synth()
    synth.fit(dataprep=dataprep, optim_method="BFGS", optim_initial="equal")
    weight = synth.weights(threshold=0.000001)
    print(weight)
    result = synth.summary()
    print(result)

    synth.path_plot(time_period=range(2008, 2022), treatment_time=2013)
    synth.gaps_plot(time_period=range(2008, 2022), treatment_time=2013)

    mape_value = synth.mape()
    print(f"MAPE: {mape_value}")
    mspe_value = synth.mspe()
    print(f"MSPE: {mspe_value}")
    att = synth.att(time_period=range(2013, 2022))
    print(f"ATT: {att}")

create_dataprep_and_run_analysis(1, 1, "T0", “gangwonwonju”, “si”)
sdfordham commented 1 month ago

Your code looks fine. It is surprising but I would guess that it is machine-precision related involving inverting the data matrix, are the numbers involved in your data very large or very small? It may help to try fixing some of the parameters in the BFGS case, you can provide more arguments to the minimize function by using optim_options. You can try playing around with the values e.g.

synth.fit(
    dataprep=dataprep,
    optim_method="BFGS",
    optim_initial="equal",
    optim_options={"gtol": 1e-3, "xrtol": 1e-5, "c1": 0.1, "c2": 0.2}
)

The options available in the BFGS case are outlined https://docs.scipy.org/doc/scipy/reference/optimize.minimize-bfgs.html.

MJ-Hon commented 1 month ago

The number of control groups is 24, and the analysis period is from 2008 to 2021. The intervention year is 2013. The same phenomenon still appears, but thank you sincerely for your advice. I will review the document you provided.