single-cell-genetics / limix_qtl

Apache License 2.0
18 stars 13 forks source link

interaction test #4

Closed NanaMatoba closed 2 years ago

NanaMatoba commented 2 years ago

Hi! I'm trying to run an interaction test for paired samples. I have three questions.

  1. the run_interaction_QTL_analysis doesn't work so I had to modify the code. Please correct me if I'm doing something wrong. so run_interaction_QTL_analysis.py uses a function (run_QTL_analysis_load_intersect_phenotype_covariates_kinship_sample_mapping) that is defined in the qtl_utilities.py as below

[phenotype_df, kinship_df, covariate_df, sample2individual_df,complete_annotation_df, annotation_df, snp_filter_df, snp_feature_filter_df, geneticaly_unique_individuals, minimum_test_samples, feature_list, bim, fam, bed, bgen, chromosome, selectionStart, selectionEnd, feature_variant_covariate_df]=\ utils.run_QTL_analysis_load_intersect_phenotype_covariates_kinship_sample_mapping(pheno_filename=pheno_filename, anno_filename=anno_filename, geno_prefix=geno_prefix, plinkGenotype=plinkGenotype, cis_mode=cis_mode, skipAutosomeFiltering = skipAutosomeFiltering, minimum_test_samples= minimum_test_samples, relatedness_score=relatedness_score, snps_filename=snps_filename, feature_filename=feature_filename, snp_feature_filename=snp_feature_filename, selection=genetic_range, covariates_filename=covariates_filename, kinship_filename=kinship_filename, sample_mapping_filename=sample_mapping_filename, extended_anno_filename=extended_anno_filename, feature_variant_covariate_filename=feature_variant_covariate_filename)

However, I noticed that the key defined in the function is not the same as one in qtl_utilities.py, so run_interaction_QTL_analysis.py fails. I modified the code as below (added randomeff_df, changed kinship_filename to randomeff_filename), and it seems to be working, but I'm not sure if this is the right thing to do or you intended to have a specific function in util.py for interaction test but somehow missing.

[phenotype_df, kinship_df, randomeff_df, covariate_df, sample2individual_df,complete_annotation_df, annotation_df, snp_filter_df, snp_feature_filter_df, geneticaly_unique_individuals, minimum_test_samples, feature_list, bim, fam, bed, bgen, chromosome, selectionStart, selectionEnd, feature_variant_covariate_df]=\ utils.run_QTL_analysis_load_intersect_phenotype_covariates_kinship_sample_mapping(pheno_filename=pheno_filename, anno_filename=anno_filename, geno_prefix=geno_prefix, plinkGenotype=plinkGenotype, cis_mode=cis_mode, skipAutosomeFiltering = skipAutosomeFiltering, minimum_test_samples= minimum_test_samples, relatedness_score=relatedness_score, snps_filename=snps_filename, feature_filename=feature_filename, snp_feature_filename=snp_feature_filename, selection=genetic_range, covariates_filename=covariates_filename, randomeff_filename=kinship_filename, sample_mapping_filename=sample_mapping_filename, extended_anno_filename=extended_anno_filename, feature_variant_covariate_filename=feature_variant_covariate_filename)

  1. Also, output from the interaction test is also same as normal QTL. can you confirm beta in the output file from the interaction test is beta for the condition:SNP? expression ~ SNP + condition:SNP + covariates

  2. should I include donor ID as an additional random effect or it's not necessary because I already include a kinship matrix?

Thanks in advance!

Nana

Bonder-MJ commented 2 years ago

Dear Nana,

Thanks for your message.

  1. you are right it seems I forgot to update the run_interaction_QTL_analysis method. It was using the older functionality and can't deal with 2 random effect terms because of that. I quickly checked and your changes make sense. I will try to update the code soon to have this fix included.

  2. The interaction test method indeed tests "expression ~ SNP + condition*SNP + covariates + kinship". Condition will also be part of the covariates. And yes the output is in the same format but now reflects the different test, this is to keep the majority of the code base the same.

  3. If you include a kinship matrix you are "exactly" doing that.

Thanks, Marc

NanaMatoba commented 2 years ago

Hi Marc,

Thank you for your quick response! I have a further question in response to the above comment

If you include a kinship matrix you are "exactly" doing that.

In the code, you somehow distinguish identical twins vs replicates (given we are linking the same genotype ID in the mapping file for replicates?)

for example, this is from your wiki page :

name_genotype_sample1 namepehnotype_sample1 name_genotype_sample2 namepehnotype_sample2.replica1 name_genotype_sample2 namepehnotype_sample2.replica2

Nana

Bonder-MJ commented 2 years ago

Hi Nana,

We don't distinguish between identical twins vs replicates perse. If you have replicates linked to one genotype than the kinship matrix gets "copied" for that genotype to reflect the number of replicates you have of that genotype and we put in the exact information you had in the kinship before. Meaning if you feed in just an identity matrix as you kinship you will have zeros and 2 1's if you have two copies of a sample. If you feed in an IBD/IBS matrix we will copy that matrix capturing the real genetic distance between all samples and have 1's for the copies. I hope that makes sense. This is what I meant with "that is "exactly" what we do".

In short if you link one phenotype to multiple genotypes the code copies the genotypes + kinship accordingly to reflect the duplicated sample. So that should be all good to do without an additional random effect.

Best, Marc

NanaMatoba commented 2 years ago

Thanks for the clarification!

I think all question I have at this point is clear now. Thanks again for your help and a great tool! Nana