jeffgortmaker / pyblp

BLP Demand Estimation with Python
https://pyblp.readthedocs.io
MIT License
228 stars 82 forks source link

Pyblp.solve does not solve for sigma's, just returns the initial value #122

Closed jakob-ditfurth closed 1 year ago

jakob-ditfurth commented 1 year ago

Hi,

I am running into a weird problem with pyBLP on R and am not able to debug it.

I am simulating quite standard data with 1 true random coefficient on prices:

sim_characteristics <- function(){
  J <- 25
  T <- 100
  N <- J*T
  x <- rnorm(n = N, 0, 1)
  w <- rnorm(n = N, 0, 1)
  xi <- rnorm(n = N, 0, .3)
  zeta <- rnorm(n = N, 0, .1)
  p <- 0.5*x + w + xi + zeta
  delta <- -10 + x + xi

  indx <- rep(1:T, each = J)
  char_dt <- data.table(indx = indx, x = x, p = p, 
                        xi = xi, w = w, delta = delta)
  char_dt
}

sim_shares <- function(char_dt){
  R <- 10000
  indx <- char_dt[, indx]
  x <- char_dt[, x]
  p <- as.matrix(char_dt[, p])
  xi <- char_dt[, xi]
  nu <- rnorm(n = R, -2, 0.5)

  delta <- -10 + x + xi + p %*% t(nu)
  num <- exp(delta)
  denom <- rowsum(num, group = indx) + 1
  denom <- denom[indx,]
  share <- num/denom
  share <- rowMeans(share)
  share
}

sim_data <- function(){
  char_dt <- sim_characteristics()
  shares <- sim_shares(char_dt)
  dt <- cbind(shares, char_dt)
  dt[, shares_outside := 1-sum(shares), by = indx]
  dt
}

Now, I apply the BLP solver to this data

rm(list = ls())

#Load packages
library(data.table)
library('reticulate')

#Start pyblp
use_python("/share/pkg.7/python3/3.8.10/install/bin/python3", required = TRUE)
pyblp <- import("pyblp", convert=FALSE)

#Load code
source(".../data-sim.R")
dt <- sim_data()

dt<- dt[,demand_instruments0:=w]
dt<- dt[,prices:=p]
dt<- dt[,product_ids := rep(1:25,times = 100)]
dt<- dt[,market_ids := indx]

# specify the product characteristics
lin_form <- pyblp$Formulation('1 + prices + x')
rand_form <- pyblp$Formulation('0 + prices')
blp_form <- tuple(lin_form, rand_form)

#specify the integration method
#int_method <- pyblp$Integration('halton', size=50L)
pr_integration = pyblp$Integration('product', size=5L)

problem <- pyblp$Problem(
  product_formulations = blp_form, 
  product_data = est_dt,
  integration = pr_integration
)

results <- problem$solve(
  sigma = 1L,
  optimization=pyblp$Optimization('bfgs')
)

Then, I get an output from results as follows:

Computed results after 00:00:01.

Problem Results Summary:
===============================================================================================
GMM     Objective      Gradient                    Clipped  Weighting Matrix  Covariance Matrix
Step      Value          Norm          Hessian     Shares   Condition Number  Condition Number 
----  -------------  -------------  -------------  -------  ----------------  -----------------
 2    +9.973464E-26  +5.397427E-11  -4.700072E-03     0      +3.346758E+00      +1.079494E+17  
===============================================================================================

Cumulative Statistics:
===========================================================================
Computation  Optimizer  Optimization   Objective   Fixed Point  Contraction
   Time      Converged   Iterations   Evaluations  Iterations   Evaluations
-----------  ---------  ------------  -----------  -----------  -----------
 00:00:02       Yes          0             4           349         1226    
===========================================================================

Nonlinear Coefficient Estimates (Robust SEs in Parentheses):
=======================
Sigma:      prices     
------  ---------------
prices   +1.000000E+00 
        (+6.519502E-03)
=======================

Beta Estimates (Robust SEs in Parentheses):
=================================================
       1             prices              x       
---------------  ---------------  ---------------
 -1.046108E+01    -2.081106E+00    +9.992789E-01 
(+6.391539E-03)  (+2.082035E-02)  (+1.411636E-02)
=================================================

Please note that the nonlinear coefficient estimate is equal to the inital value (sigma = 1). I have no idea why that is the case, I cannot seem to fix it. Do you have an idea?

To make sure that this is not specific to R, I have redone the same thing in python and got the same result.

I am attaching the code that I used in python - it is basically the same as R.

Many thanks, Jakob no_sigma_est_python_code.txt

jeffgortmaker commented 1 year ago

The reason is that you're underidentified. There are more parameters (4) than moments (2 or 3 depending on the code).

In your Python code, the reason you only have "MD" = 2 moments (corresponding to "included" instruments 1 and x) is that you specified "demand_instrument0", not "demand_instruments0", so pyblp isn't identifying your excluded instrument.

But even after fixing that (and I think it's fixed in your R code?), you still only have MD = 3 moments for 4 parameters. Regardless of the value for sigma, we can find linear parameters (beta) that set the objective equal to zero, and that's what pyblp does. This is why it just returns after a single optimization iteration -- it found the minimum objective value.

You need an extra instrument ("demand_instruments1") for sigma -- see e.g. Gandhi and Houde's differentiation IVs, various forms of which pyblp can construct.

jakob-ditfurth commented 1 year ago

Thanks a ton, I did not realize that my problem was underidentified. Thanks for the reference and thanks for pointing this out. I managed to fix this now.

Thanks!