Closed AvyG closed 1 year ago
Great, thanks for doing this!
For the Nevo problem, you should be able to get the same numbers as the original paper if you use the same starting values initial_sigma
and initial_pi
that you used for the first step, before computing optimal instruments (right now you're using the estimates from the first stage). That said, I actually prefer your numbers, since they achieve the essentially zero objective that we'd expect under a just-identified model with demand-only optimal instruments. We probably should have noticed this in the original paper.
For the BLP problem, our best practices column also includes using 10,000 scrambled Halton draws in each market, while in your code you're using the same importance sampling draws that allow you to get the same results as the replication column. We didn't include these Halton draws in the package because they'd take up a lot of space, but you should be able to construct them with something like this:
market_ids = []
weights = []
nodes = []
income = []
for index, (_, row) in enumerate(means.iterrows()):
integration = pyblp.Integration('halton', 10_000, {
'discard': 1000 + index * 10_000,
'seed': index
})
untransformed_agents = pyblp.build_integration(integration, 6)
market_ids.append(np.repeat(1900 + int(row['year']), untransformed_agents.weights.size))
weights.append(untransformed_agents.weights)
nodes.append(untransformed_agents.nodes[:, :-1])
income.append(np.exp(row['mean'] + float(sd) * untransformed_agents.nodes[:, -1]))
blp_agent_data = {
'market_ids': np.concatenate(market_ids),
'weights': np.concatenate(weights),
'nodes': np.vstack(nodes),
'income': np.concatenate(income),
}
Here means
is a dataframe with income means by year:
year,mean
71,2.011560000000
72,2.065260000000
73,2.078430000000
74,2.057750000000
75,2.029150000000
76,2.053460000000
77,2.067450000000
78,2.098050000000
79,2.104040000000
80,2.072080000000
81,2.060190000000
82,2.065610000000
83,2.076720000000
84,2.104370000000
85,2.126080000000
86,2.164260000000
87,2.180710000000
88,2.188560000000
89,2.212500000000
90,2.183770000000
and sd
is a common standard deviation value of 1.72
.
The other thing is for that final column we computed optimal instruments after only one step, not two. So you'll need to compute optimal instruments from the first step of the replication column, not the second, in order to get numbers that exactly match.
Some general advice that I've built up over the years, which I've been meaning to put into a tutorial, is:
Hope this helps.
Hi,
Thank for your quick and detailed response. With your advice, I could replicate the results of the fake cereal data. I am still working with the automobile data.
Thank you so much!
Great! I'm going to close this issue for now, but please feel free to re-open / keep commenting if you have trouble with the automobile data, or anything else.
Hi,
I want to start using the package so I started by trying to replicate the results from your paper, but I am having some troubles obtaining the same results, specifically for the best practices part.
I applied the methods mentioned in the paper and the tutorial and I can obtain the results of the replication column for the fake cereal and automobile data, but not for the best practices column.
Since I am new to the method I am not sure if I am skipping some steps, or doing something wrong. Here is a repo (https://github.com/AvyG/BLP-replicationproject) with my notebook. I am not sure what I am doing wrong.
Thanks in advance!