Questions about the user/non-user cohort generation step and the bootstrapping step

Hi Ruoqi -

I have couple questions related to the user/non-user cohort generation step and the bootstrapping step. I appreciate it if you could help me understand your paper better.

In the user/non-user cohort generation step, looking at the 1,353 unique drugs, for each drug, you look at the ingredients of the drug, let's say a drug has ingredients A, B, and C, if there are other drugs containing at least one of these ingredients, then the drug would be a potential drug for the user cohort, and all those other drugs would be the corresponding alternative drugs for the non-user cohort? Otherwise, if there are no other drugs containing at least 1 of the ingredients A, B, and C, then this drug will not be considered as a potential drug for the user cohort? Basically, I'm wondering when you narrowed down your search space from 1,353 drugs to 55 drugs, if the above logic is one of the criteria in that process (I know you have other criteria here: CAD initialization date < first prescription date or index date, after drug's index date at least 1 more prescription in the follow-up period, two prescriptions at least 30 days apart, patients have at least 1 year of history before index date and 2 years of history after index date, both cohort sizes must be larger than 500. Hope I get all of the these correctly here...)

In the bootstrapping step, in the paper you wrote "For each candidate ingredient, we repeatedly generate multiple different control drugs via random sampling with replacement, and the analysis is repeated in each bootstrap sample." I'm confused here. Could you please help me understand what you are doing here with bootstrapping? Again, let's assume user cohort drug has 3 ingredients, A, B, and C, let's say there are 3 other drugs that have ingredient A, 5 other drugs that have ingredient B, and 10 other drugs that have ingredient C. So basically, all the patients that have taken one of the 3+5+10=18 drugs after CAD initialization date would be placed in the corresponding non-user cohorts? And in each bootstrapping iteration, you randomly took x, y, z drugs with replacement from the 3, 5, 10 drugs and their associated patients to form a new non-user cohort sample and calculate the ATE? And if so, how you decide x, y, and z here? I'm not sure how exactly you got each bootstrapping sample and its size.

Sorry I have a lot of questions here... Thank you in advanced!!!

Best, Hao

ruoqi-liu / DeepIPW

Questions about the user/non-user cohort generation step and the bootstrapping step #3