nmdp-bioinformatics / py-ard

HLA ARD Reduction in Python
https://py-ard.org/
GNU Lesser General Public License v3.0
17 stars 13 forks source link

add new redux mode T for transcript #268

Open mmaiers-nmdp opened 1 year ago

mmaiers-nmdp commented 1 year ago

The 4th field of HLA nomenclature includes:

  1. intron variants
  2. UTR variants

Due to complexities in the typing of UTRs and also/especially the dynamic nature of their definition in the IMGT/HLA database, a new redux method is needed to collapse over ambiguity the UTR regions.

This redux mode will allow analysis of 4-field data where the intent is to treat alleles as equivalent if they only differ in either the 5'UTR or 3'UTR.

The proposed name for this new redux mode is "T" which refers to gene transcript: all exons and introns.

This redux method should work exactly like "G" in that it depends on a mapping file hla_nom_t.txt. It probably needs to follow and expansion to "W" to make sure the data is at that resolution first.

This file can be produced by a 41 line perl script to post-process a cypher query from GFEDB. I would include it here but there is not sufficient space in the margins.

# file: hla_nom_t.txt
# date: 2023-08-28
# version: 3.35.0
# author: Martin Maiers (mmaiers@nmdp.org)
HLA-A;HLA-A*01:01:01:01/HLA-A*01:01:01:22;HLA-A*01:01:01:01T
HLA-A;HLA-A*02:01:01:01/HLA-A*02:01:01:02L/HLA-A*02:01:01:16/HLA-A*02:01:01:31/HLA-A*02:01:01:50;HLA-A*02:01:01:01T
HLA-A;HLA-A*02:09:01:01/HLA-A*02:09:01:02;HLA-A*02:09:01:01T
HLA-A;HLA-A*11:01:01:01/HLA-A*11:01:01:13;HLA-A*11:01:01:01T
HLA-A;HLA-A*24:02:01:01/HLA-A*24:02:01:15/HLA-A*24:02:01:16;HLA-A*24:02:01:01T
HLA-A;HLA-A*30:02:01:02/HLA-A*30:02:01:04;HLA-A*30:02:01:02T
HLA-A;HLA-A*32:01:01:01/HLA-A*32:01:01:07/HLA-A*32:01:01:10;HLA-A*32:01:01:01T
HLA-A;HLA-A*33:03:01:01/HLA-A*33:03:01:02/HLA-A*33:03:01:03/HLA-A*33:03:01:05;HLA-A*33:03:01:01T
HLA-A;HLA-A*68:02:01:01/HLA-A*68:02:01:02;HLA-A*68:02:01:01T
HLA-B;HLA-B*07:02:01:01/HLA-B*07:02:01:03/HLA-B*07:02:01:10/HLA-B*07:02:01:12/HLA-B*07:02:01:13/HLA-B*07:02:01:14;HLA-B*07:02:01:01T
HLA-B;HLA-B*07:05:01:01/HLA-B*07:05:01:03;HLA-B*07:05:01:01T
HLA-B;HLA-B*07:75:01:01/HLA-B*07:75:01:02;HLA-B*07:75:01:01T
HLA-B;HLA-B*08:01:01:01/HLA-B*08:01:01:02/HLA-B*08:01:01:07/HLA-B*08:01:01:08/HLA-B*08:01:01:09/HLA-B*08:01:01:10;HLA-B*08:01:01:01T
HLA-B;HLA-B*13:01:01:01/HLA-B*13:01:01:02;HLA-B*13:01:01:01T
HLA-B;HLA-B*13:02:01:01/HLA-B*13:02:01:05;HLA-B*13:02:01:01T
HLA-B;HLA-B*15:01:01:01/HLA-B*15:01:01:16;HLA-B*15:01:01:01T
HLA-B;HLA-B*15:18:01:01/HLA-B*15:18:01:02/HLA-B*15:18:01:05;HLA-B*15:18:01:01T
HLA-B;HLA-B*15:21:01:01/HLA-B*15:21:01:02;HLA-B*15:21:01:01T
HLA-B;HLA-B*18:01:01:01/HLA-B*18:01:01:06/HLA-B*18:01:01:18/HLA-B*18:01:01:19;HLA-B*18:01:01:01T
HLA-B;HLA-B*18:01:01:02/HLA-B*18:01:01:04/HLA-B*18:01:01:05/HLA-B*18:01:01:16/HLA-B*18:01:01:17;HLA-B*18:01:01:02T
HLA-B;HLA-B*27:05:02:01/HLA-B*27:05:02:05/HLA-B*27:05:02:08/HLA-B*27:05:02:09/HLA-B*27:05:02:10;HLA-B*27:05:02:01T
HLA-B;HLA-B*35:01:01:01/HLA-B*35:01:01:02/HLA-B*35:01:01:04/HLA-B*35:01:01:05/HLA-B*35:01:01:06/HLA-B*35:01:01:13/HLA-B*35:01:01:14/HLA-B*35:01:01:16/HLA-B*35:01:01:18;HLA-B*35:01:01:01T
HLA-B;HLA-B*35:02:01:01/HLA-B*35:02:01:02/HLA-B*35:02:01:03;HLA-B*35:02:01:01T
HLA-B;HLA-B*35:03:01:01/HLA-B*35:03:01:03/HLA-B*35:03:01:09;HLA-B*35:03:01:01T
HLA-B;HLA-B*39:01:01:02L/HLA-B*39:01:01:03/HLA-B*39:01:01:05/HLA-B*39:01:01:07/HLA-B*39:01:01:09;HLA-B*39:01:01:02LT
HLA-B;HLA-B*39:01:03:01/HLA-B*39:01:03:02;HLA-B*39:01:03:01T
HLA-B;HLA-B*39:02:02:01/HLA-B*39:02:02:03;HLA-B*39:02:02:01T
HLA-B;HLA-B*39:03:01:01/HLA-B*39:03:01:02;HLA-B*39:03:01:01T
HLA-B;HLA-B*39:05:01:01/HLA-B*39:05:01:02;HLA-B*39:05:01:01T
HLA-B;HLA-B*39:09:01:01/HLA-B*39:09:01:02;HLA-B*39:09:01:01T
HLA-B;HLA-B*39:12:01:01/HLA-B*39:12:01:02;HLA-B*39:12:01:01T
HLA-B;HLA-B*40:01:02:01/HLA-B*40:01:02:02/HLA-B*40:01:02:04/HLA-B*40:01:02:06/HLA-B*40:01:02:09;HLA-B*40:01:02:01T
HLA-B;HLA-B*40:02:01:01/HLA-B*40:02:01:02/HLA-B*40:02:01:03/HLA-B*40:02:01:05/HLA-B*40:02:01:07/HLA-B*40:02:01:08;HLA-B*40:02:01:01T
HLA-B;HLA-B*40:06:04:01/HLA-B*40:06:04:02;HLA-B*40:06:04:01T
HLA-B;HLA-B*40:10:01:01/HLA-B*40:10:01:02;HLA-B*40:10:01:01T
HLA-B;HLA-B*41:02:01:01/HLA-B*41:02:01:03;HLA-B*41:02:01:01T
HLA-B;HLA-B*44:02:01:01/HLA-B*44:02:01:07/HLA-B*44:02:01:08/HLA-B*44:02:01:11/HLA-B*44:02:01:14/HLA-B*44:02:01:15/HLA-B*44:02:01:17/HLA-B*44:02:01:18/HLA-B*44:02:01:20;HLA-B*44:02:01:01T
HLA-B;HLA-B*44:03:01:01/HLA-B*44:03:01:04/HLA-B*44:03:01:10/HLA-B*44:03:01:12/HLA-B*44:03:01:13/HLA-B*44:03:01:15/HLA-B*44:03:01:16;HLA-B*44:03:01:01T
HLA-B;HLA-B*48:01:01:01/HLA-B*48:01:01:02;HLA-B*48:01:01:01T
HLA-B;HLA-B*49:01:01:01/HLA-B*49:01:01:02/HLA-B*49:01:01:04;HLA-B*49:01:01:01T
HLA-B;HLA-B*51:01:01:01/HLA-B*51:01:01:03/HLA-B*51:01:01:04/HLA-B*51:01:01:05/HLA-B*51:01:01:06/HLA-B*51:01:01:07/HLA-B*51:01:01:08/HLA-B*51:01:01:09/HLA-B*51:01:01:10/HLA-B*51:01:01:11/HLA-B*51:01:01:12/HLA-B*51:01:01:13;HLA-B*51:01:01:01T
HLA-B;HLA-B*51:02:01:01/HLA-B*51:02:01:02;HLA-B*51:02:01:01T
HLA-B;HLA-B*52:01:01:01/HLA-B*52:01:01:02/HLA-B*52:01:01:03;HLA-B*52:01:01:01T
HLA-B;HLA-B*52:01:02:01/HLA-B*52:01:02:02/HLA-B*52:01:02:03;HLA-B*52:01:02:01T
HLA-B;HLA-B*55:02:01:01/HLA-B*55:02:01:02/HLA-B*55:02:01:03;HLA-B*55:02:01:01T
HLA-B;HLA-B*56:01:01:02/HLA-B*56:01:01:03/HLA-B*56:01:01:04;HLA-B*56:01:01:02T
HLA-B;HLA-B*57:01:01:01/HLA-B*57:01:01:03/HLA-B*57:01:01:04;HLA-B*57:01:01:01T
HLA-B;HLA-B*58:01:01:01/HLA-B*58:01:01:03;HLA-B*58:01:01:01T
HLA-B;HLA-B*59:01:01:01/HLA-B*59:01:01:02;HLA-B*59:01:01:01T
HLA-C;HLA-C*01:02:01:01/HLA-C*01:02:01:05/HLA-C*01:02:01:06/HLA-C*01:02:01:08;HLA-C*01:02:01:01T
HLA-C;HLA-C*02:02:02:01/HLA-C*02:02:02:03/HLA-C*02:02:02:05/HLA-C*02:02:02:20;HLA-C*02:02:02:01T
HLA-C;HLA-C*03:02:02:01/HLA-C*03:02:02:03/HLA-C*03:02:02:05;HLA-C*03:02:02:01T
HLA-C;HLA-C*03:03:01:01/HLA-C*03:03:01:09;HLA-C*03:03:01:01T
HLA-C;HLA-C*03:04:01:01/HLA-C*03:04:01:02/HLA-C*03:04:01:10/HLA-C*03:04:01:12/HLA-C*03:04:01:13;HLA-C*03:04:01:01T
HLA-C;HLA-C*03:07:01:01/HLA-C*03:07:01:02;HLA-C*03:07:01:01T
HLA-C;HLA-C*04:01:01:01/HLA-C*04:01:01:11/HLA-C*04:01:01:14/HLA-C*04:01:01:23/HLA-C*04:01:01:31/HLA-C*04:01:01:32;HLA-C*04:01:01:01T
HLA-C;HLA-C*05:01:01:01/HLA-C*05:01:01:16;HLA-C*05:01:01:01T
HLA-C;HLA-C*05:01:01:02/HLA-C*05:01:01:03/HLA-C*05:01:01:10/HLA-C*05:01:01:11/HLA-C*05:01:01:12;HLA-C*05:01:01:02T
HLA-C;HLA-C*06:02:01:01/HLA-C*06:02:01:10/HLA-C*06:02:01:11/HLA-C*06:02:01:13;HLA-C*06:02:01:01T
HLA-C;HLA-C*06:02:01:03/HLA-C*06:02:01:17;HLA-C*06:02:01:03T
HLA-C;HLA-C*07:01:01:01/HLA-C*07:01:01:08/HLA-C*07:01:01:16;HLA-C*07:01:01:01T
HLA-C;HLA-C*07:02:01:01/HLA-C*07:02:01:15/HLA-C*07:02:01:26/HLA-C*07:02:01:34;HLA-C*07:02:01:01T
HLA-C;HLA-C*07:02:01:03/HLA-C*07:02:01:09/HLA-C*07:02:01:10/HLA-C*07:02:01:11/HLA-C*07:02:01:23/HLA-C*07:02:01:28/HLA-C*07:02:01:32/HLA-C*07:02:01:35/HLA-C*07:02:01:36;HLA-C*07:02:01:03T
HLA-C;HLA-C*07:04:01:01/HLA-C*07:04:01:03;HLA-C*07:04:01:01T
HLA-C;HLA-C*12:02:02:01/HLA-C*12:02:02:02;HLA-C*12:02:02:01T
HLA-C;HLA-C*12:03:01:01/HLA-C*12:03:01:06/HLA-C*12:03:01:10;HLA-C*12:03:01:01T
HLA-C;HLA-C*12:13:01:01/HLA-C*12:13:01:02;HLA-C*12:13:01:01T
HLA-C;HLA-C*14:02:01:01/HLA-C*14:02:01:04/HLA-C*14:02:01:07;HLA-C*14:02:01:01T
HLA-C;HLA-C*15:02:01:01/HLA-C*15:02:01:03;HLA-C*15:02:01:01T
HLA-C;HLA-C*16:01:01:01/HLA-C*16:01:01:02;HLA-C*16:01:01:01T
HLA-C;HLA-C*17:01:01:02/HLA-C*17:01:01:04;HLA-C*17:01:01:02T
HLA-C;HLA-C*17:03:01:01/HLA-C*17:03:01:02/HLA-C*17:03:01:03;HLA-C*17:03:01:01T
HLA-DPA1;HLA-DPA1*01:03:01:01/HLA-DPA1*01:03:01:11;HLA-DPA1*01:03:01:01T
HLA-DPA1;HLA-DPA1*01:03:01:05/HLA-DPA1*01:03:01:15;HLA-DPA1*01:03:01:05T
HLA-DPA1;HLA-DPA1*02:02:02:01/HLA-DPA1*02:02:02:02/HLA-DPA1*02:02:02:03;HLA-DPA1*02:02:02:01T
HLA-DPB1;HLA-DPB1*01:01:01:01/HLA-DPB1*01:01:01:03;HLA-DPB1*01:01:01:01T
HLA-DPB1;HLA-DPB1*03:01:01:08/HLA-DPB1*03:01:01:09;HLA-DPB1*03:01:01:08T
HLA-DPB1;HLA-DPB1*05:01:01:04/HLA-DPB1*05:01:01:10;HLA-DPB1*05:01:01:04T
HLA-DPB1;HLA-DPB1*13:01:01:05/HLA-DPB1*13:01:01:06;HLA-DPB1*13:01:01:05T
HLA-DPB1;HLA-DPB1*20:01:01:01/HLA-DPB1*20:01:01:02;HLA-DPB1*20:01:01:01T
HLA-DQA1;HLA-DQA1*05:06:01:01/HLA-DQA1*05:06:01:02;HLA-DQA1*05:06:01:01T
HLA-DQB1;HLA-DQB1*03:01:01:01/HLA-DQB1*03:01:01:10/HLA-DQB1*03:01:01:20;HLA-DQB1*03:01:01:01T
HLA-DQB1;HLA-DQB1*03:03:02:02/HLA-DQB1*03:03:02:03;HLA-DQB1*03:03:02:02T
HLA-DQB1;HLA-DQB1*05:03:01:01/HLA-DQB1*05:03:01:02;HLA-DQB1*05:03:01:01T
mmaiers-nmdp commented 1 year ago

i've got this coded on a branch in my fork different format for the table