Retrieving the firm and worker identifiers (Pytwoway 0.1.14.)

Adam's answer:

Thank you for reaching out!

I wrote up some example code to illustrate how this can be done. This takes advantage of the option include_id_reference_dict in BipartitePandas. Unfortunately this means that the data cleaning must be done manually, but it's just a few extra lines of code.

To run this on your own code, you can replace sim_data with your own data, and delete the line that takes the subset of i < 100.

Also note that I used some options I added after this issue was raised on the github, which makes it so it only generates the fixed effects and doesn't estimate the variance/covariances.

Best, Adam

import bipartitepandas as bpd
import pytwoway as tw
import pandas as pd

#### Simulate data
sim_data = bpd.SimBipartite({'nk': 50, 'num_time': 2, 'num_ind': 1000}).sim_network()
#### Manually clean data
bdf = bpd.BipartiteLong(sim_data, include_id_reference_dict=True) # Set include_id_reference_dict=True to save original ids
#### Subset of data so largest connected set is subset of all firms
bdf = bdf[bdf['i'] < 100]
bdf = bdf.clean_data()
bdf.gen_m()

#### Create TwoWay object
tw_net = tw.TwoWay(bdf.original_ids()) # bdf.original_ids() creates a dataframe with columns that give the original ids
#### Skip data cleaning step in TwoWay object, but mark data as clean
tw_net.clean = True

fe_params = {
'ncore': 1, # Number of cores to use
'batch': 1, # Batch size to send in parallel
'ndraw_pii': 50, # Number of draws to use in approximation for leverages
'levfile': '', # File to load precomputed leverages
'ndraw_tr': 5, # Number of draws to use in approximation for traces
'he': False, # If True, compute heteroskedastic correction
'out': 'res_fe.json', # Outputfile where results are saved
'statsonly': False, # If True, return only basic statistics
'feonly': True, # If True, compute only fixed effects and not variances
'Q': 'cov(alpha, psi)' # Which Q matrix to consider. Options include 'cov(alpha, psi)' and 'cov(psi_t, psi_{t+1})'
}

#### Since we set 'feonly': True, we just run the estimator normally and it only estimates the fixed effects to save time
tw_net.fit_fe(fe_params)

#### Now look at the data
new_data = tw_net.data

I would also recommend setting the following for better performance:

bdf = bdf.clean_data({'data_validity': False})

But also be careful that this isn't designed to work if you are manipulating the data or reformatting the data (for instance from long to event study, etc.) after data cleaning, so you should verify it is working properly in your case before committing to using it.

tlamadon / pytwoway

Retrieving the firm and worker identifiers (Pytwoway 0.1.14.) #5