Problem with including log-normally distributed random coefficients on price

JanSiriwardana commented 6 months ago

Hi Jeff,

I have estimated an RC model and then attempted to run a simulation using the results by changing some values in a dummy variable from 0 to 1. I encountered a similar problem to issue #119 where the simulation fails to converge because of a divide by zero error. As you suggest in that issue, I have tried specifying log-normal random coefficients on price by following the instructions in the docs but cannot seem to get it to work.

In the docs it says "Implementing this typically involves having an I(-prices) in the formulation for X2 and instead of including prices in X1, including a 1 in the agent_formulation." However, I do not have any demographic data so there is no agent_data and specifying agent_formulation = pyblp.Formulation('1') returns an error.

I've included my formulation and problem construction below

X1 = pyblp.Formulation('0 + carb + sweetener', absorb='C(brand) + C(market_ids)')
X2 = pyblp.Formulation('1 + I(-prices) + carb + sweetener')

product_formulations = (X1, X2)
mc_integration = pyblp.Integration('halton', size=500, specification_options={'seed': 0})
mc_problem = pyblp.Problem(product_formulations, df, integration=mc_integration)
l_bfgs_b = pyblp.Optimization('l-bfgs-b', {'gtol': 1e-12})
result = mc_problem.solve(sigma=np.eye(4), optimization=l_bfgs_b)

but this does not give a mean value for price.

I am also unclear about specifying rc_types and the instruction "The list should have as many strings as there are columns in X2. Each string determines the type of the random coefficient on the corresponding product characteristic in X2." If I only want price to be log-normal, given linear distributions for the other random coefficients how do I include this.

Thanks.

jeffgortmaker commented 6 months ago

What's the error that you're getting? Something about needing agent data? I generally recommend just building and specifying agent data yourself (including nodes0, ..., nodes3) instead of using integration these days.

For your case you'd want rc_types=['linear', 'log', 'linear', 'linear']. What would you edit to make the docs on rc_types more clear?

JanSiriwardana commented 6 months ago

The error is

ValueError: Since agent_formulation is specified, agent_data must be specified as well.

Just so I'm clear, using Halton draws, the columns of agent datawould be market_ids and nodes equal to the number of dimensions in X2 (labelled 0 - (K2-1). Do I need to include a column of weights too? The length of agent_data is equal to the number of markets * the number of draws per market. Each cell within agent data is a single Halton draw using a different prime for each node.

Would you recommend adapting the code in integration.py as the best way to do this, for example to discard and scramble sequences? Is there a reason why you recommend building agent_data ourselves rather than using integration?

As for misunderstanding rc_types, I think that was a me problem! Perhaps a small example could make it clearer but I did manage to figure it out so probably not necessary.

Thanks

jeffgortmaker commented 6 months ago

That's right re agent_data. And yes, you need a column of weights.

I recommend using pyblp.build_integration or scipy.stats.qmc.Halton, not adapting the code yourself. And I recommend building agent_data because I've found it (1) makes it more clear exactly what you're doing, so there's less room for error, and (2) it allows you to more easily add demographics later if you want to add them. The integration method is more a remnant of the past, and I now wish I hadn't included it in the API (but it's going to stay for backwards-compatibility).

And great re rc_types. At some point we'll have a tutorial with a lognormal random coefficient, but posts on the issue tracker like this are also ways to give this info to users, so thanks for posting!

jeffgortmaker commented 6 months ago

I'm going to close this for now since it seems like your problem is solved. But feel free to keep commenting if there's more!

jeffgortmaker / pyblp

Problem with including log-normally distributed random coefficients on price #157