lgragert / srtr-impute-pubsaf2306

Extract HLA typing from SRTR and format data for 9-locus high resolution HLA imputation
1 stars 0 forks source link

AgMM and AAMM Summary Tables Update #21

Open alyspayn opened 2 months ago

alyspayn commented 2 months ago

Fix the old scripts that need to be updated with new FIBERS 2.0 data. https://github.com/lgragert/srtr-impute-pubsaf2306/blob/main/FIBERS_AgMM_summary_table.py https://github.com/lgragert/srtr-impute-pubsaf2306/blob/main/FIBERS_AAMM_summary_table.py

  1. The old scripts need to be in terms of the SRTR_AA_MM_9locmatrix*.txt.gz files (change lines 137-147) and new population groups (change line 130 for that).
  2. Create a config file of the high and low-risk groups found by FIBERS so that it is not hard-coded into the scripts.
    • The low risk group is when there are 0 AA-MM and high risk group is when there are >=1 AA-MM from the AA-MM found from FIBERS bins. (lines 160-169 is where the hardcode is for the high and low risk FIBERS group).
  3. The same or different config file can also be used for the AA-MM script, where it will let us look at the top bins created by FIBERS (lines 255-313). You also have to change the population groups (line 236) and file name to be the matrix files (line 218).
alyspayn commented 1 month ago

The hardcoding is no longer within the scripts and takes in the config file amino_acids.json. However some issues still persist. The matrix files are missing the AgMM for DQ, so the scripts cannot run fully without it. Also thee population groups are different in the file and need to be mapped to the names that we want.

JK4800 commented 3 weeks ago