jeffgortmaker / pyblp

BLP Demand Estimation with Python
https://pyblp.readthedocs.io
MIT License
240 stars 83 forks source link

divide by zero error in "ProblemResults.compute_consumer_surpluses(eliminate_product_ids)" #155

Closed adrukker closed 7 months ago

adrukker commented 8 months ago

Hi Jeff,

I'm trying to do a simple counterfactual where I compute consumer surplus after some products are eliminated. I'm trying to use "ProblemResults.compute_consumer_surpluses(eliminate_product_ids)" but I am getting divide by zero errors. I'm wondering if this is a bug or if I am doing something wrong.

Here is a sample of the data I'm working with:

market_ids product_ids to_eliminate delta nesting_ids
M1 P1 1 -6.718102 1
M1 P2 0 -7.286785 1
M2 P1 1 -6.610115 1
M2 P3 0 -6.819999 1

The parameter estimates of the nested logit model after Problem.solve( ) are $\rho=0.4823625$ and $\alpha=-0.00669866$.

When I run ProblemResults.compute_consumer_surpluses(market_id='M2', eliminate_product_ids='P1') I get 0.16359049, which is close to what I get when I compute the value in Excel using the formulas for consumer surplus: 0.162887282.

But when I run ProblemResults.compute_consumer_surpluses(market_id='M1', eliminate_product_ids='P1') I get "Errors encountered: divide by zero." When I compute the value in Excel using the formulas for consumer surplus, I get: 0.102153429.

Do you know why I am getting this divide by zero error? The only divisions in the formulas for consumer surplus are $1/(1-\rho)$ and $-1/\alpha$, which are constant and positive.

Thanks for any help you can provide.

-Austin

jeffgortmaker commented 8 months ago

Thanks for the report Austin! This looks like it might be a bug since you don't seem to be doing anything weird.

Did you get a traceback with the error? And if not does setting pyblp.options.verbose_tracebacks = True before you run the code provide one?

The problem is probably going to be somewhere in this function: https://github.com/jeffgortmaker/pyblp/blob/80d09650af341c0410681ee497c47b2deefb8ce9/pyblp/markets/economy_results_market.py#L355

... but it'll help to pinpoint it if possible. Otherwise I might need a minimum working example that I can run on my end.

adrukker commented 8 months ago

The traceback error points to line 394 of safely_compute_consumer_surplus:

exp_utilities = np.exp(np.log(self.groups.sum(exp_utilities)) * (1 - self.group_rho))

Attached are two small datasets, where the only difference is the labels for the product_ids.

cs-error-example1.csv contains my original labels for product_ids. cs-error-example2.csv exchanges the original labels for product_ids with generic 'P1', 'P2', 'P3' labels.

When you run the attached script on the data with the original labels for product_ids, it produces the divide by zero error. But when you run it on the data with the modified labels for product_ids it does not produce the error.

Are you able to reproduce the errors on your end?

cs-error-example1.csv cs-error-example2.csv cs-error-example.txt

jeffgortmaker commented 7 months ago

Sorry about the delay, it's been a busy week.

Thanks for the fantastic MWE. The issue is that you probably want to provide a list or tuple to eliminate_product_ids, not a string. What PyBLP does is loops through all the product_ids in the market and checks whether each product_id in eliminate_product_ids. Since both 'MSPANCDL' and 'ABRMSPANCDLDL' are in your provided string, it eliminates both, yielding an empty nest and hence a divide by zero error as we'd want.

Instead, you probably want to do eliminate_product_ids=['ABRMSPANCDLDL']. Then product_id in eliminate_product_ids will do as you'd expect and eliminate just one product. Let me know if that works.

I think a warning about providing a string is probably warranted, so I'll add that now.

adrukker commented 7 months ago

Thanks Jeff. No rush on this! As previously mentioned, I'm able to export the data to compute CS in another program using the equations. But it would be nice to check these calculations against what is produced by PyBLP's compute_consumer_surpluses.

Providing the products as a list worked for the specific market I provided in the previous example, and I understand why it was throwing an error before.

I'm now trying to pass the full list of 11,724 products (out of 38,973) to eliminate and am hitting the same error.

I've tried to isolate the products that might be causing the problem, and the issue appears around product 2,022.

When I restrict the list to a length of 2,021 there is no error:

product_subset1 = all_products[:2021] cs_subset1 = ProblemResults.compute_consumer_surpluses(eliminate_product_ids = product_subset1) (no error)

But when I include product 2,022 the divide by zero error occurs:

product_subset2 = all_products[:2022] cs_subset2= ProblemResults.compute_consumer_surpluses(eliminate_product_ids = product_subset2) (divide by zero error)

However, when I eliminate only the supposedly problematic product 2,022, there is no error:

product_subset3 = all_products[2021:2022] cs_subset3= ProblemResults.compute_consumer_surpluses(eliminate_product_ids = product_subset3) (no error)

Any thoughts about what might be going on?

jeffgortmaker commented 7 months ago

Could it be that when you include product 2,022 you end up totally eliminating a nest?

When a nest is eliminated, the calculation is something like exp(log(0) * (1 - rho)) = exp(-inf) = 0, i.e. the inclusive value of the eliminated nest is zero as we'd want, but we do get the divide by zero error that's tripping you up.

I just pushed another fix that eliminates this warning because it's probably not needed. Let me know if that helps?

adrukker commented 7 months ago

Thanks, this makes perfect sense. It turns out the market had only two products (including 2,022) and the other product was also eliminated, which eliminated the whole nest. The resulting CS calculation for that market was 0, as it should be. Comparing the PyBLP result with the one I computed with the formulas results in a difference of less than 0.0001%.

Thanks again for your help!

jeffgortmaker commented 7 months ago

Great! Thanks for opening the issue and helping me fix it.