Open Zepeng-Mu opened 2 years ago
So this file https://github.com/rwdavies/QUILT/blob/master/hla_ancillary_files/quilt_hla_supplementary_info.txt contains three columns. Let's take a look at the help entry in QUILT_HLA_prepare_reference.R
Path to file with supplementary information about the genes, necessary for proper converstion. File is tab separated with header, with 3 columns. First (allele) is the allele that matches the reference genome. Second (genome_pos) is the position of this allele in the reference genome, finally the strand (strand) (options 1 or -1)
I'll check with Simon Myers, who I'm pretty sure made the file, about any tips for making a new one, and get back to you. I can imagine how to make a new one, it is just a lot of checking (especially the first column, which could be automated, but is a hassle). Sorry otherwise for my slow reply, my daughter is home from nursery sick so I am slower at replying to emails / answering github issues. Thanks, Robbie
@Zepeng-Mu I am making a supplementary table now for a couple other genes I needed and it looks like information needed for the rest of the genes are listed in IMGT/HLA database here: https://www.ebi.ac.uk/ipd/imgt/hla/help/genomics.html
Specifically:
Gene
for gene nameGRCh38 Location
for strand orientationStart of ATG (Initiator Met)
for start position of alleleGRCh38 reference allele
for the allele that matches reference genome.although these are all in hg38 and not 100% sure the alleles that match the reference are the same between the reference versions..
Just one more update on this, I ended up not being able to make reference files for other HLA genes because they are not available in the 1000G reference file 20181129_HLA_types_full_1000_Genomes_Project_panel.txt
, which only includes HLA typing for HLA-A, B, C, DRB1, and DQB1.
Hi, so I changed the code to completely remove the dependencies for these files, and to start from more obvious dependencies. The code is written and runs to completion. I had meant to re-run the pipeline I used in the paper with a few different versions of the reference package to make sure I hadn't broken any functionality before pushing. Let me get back to you on this.
Sounds great! I would like to try the newer version with fewer dependencies when it's available.
Hello, I'm trying to use the new 1.0.3 version to prepare HLA reference. I found that many GRCh38 files from 1000G has no counterpart in GRCh37, or is very hard to find. I'm wondering whether it's possible to build a reference file using GRCh37? Thanks!
Hi both,
I've had a busy start to term so got really behind on things including this. I'm getting back up on my non-teaching activities now.
I properly pushed the new version of the code to the repository now. I used it to build a new reference package which is on the main QUILT HLA page https://github.com/rwdavies/QUILT/blob/master/README_QUILT-HLA.md#paragraph-reference-packages
I tested it versus the old version, and it worked, through performance in some alleles is a bit down for non-Europeans. I think this is because I'm not using HRC for the imputation but 1000 Genomes but need to check. I think I thought that wouldn't make much of a difference, but want to go back and properly benchmark that now.
Best, Robbie
Hi, I'm wondering how I can make a
--quilt_hla_supplementary_info_file
for hg19. I guess I can liftOver the file provided in the Github repo, but I hope to add more genes, likec('A','B','C','DPA1','DPB1','DQA1','DQB1','DRB1')
.Thanks so much!