Closed dmiller15 closed 5 months ago
Thank you very much for finding this problem. I'll check whether there are annotations containing HLA-H/F/E, or create our own list of gene coordinates and use that as the input for AddGeneCoord.pl.
Thank you for finding this issue. I just added an option "--gtf-gene-name-mapping" to the AddGeneCoord.pl script. Its default value is "HFE:HLA-HFE" and we can use comma-split string to represent other gene name mappings. This will internally map the gene name in the GTF to the name specified by the user. Hope this can help resolve this issue.
Thanks for the quick response. I am now seeing coordinates for HLA-HFE.
I've been testing out the software, and I noticed a discrepancy in HLA-HFE between using a BAM and FASTQ input. Where the FASTQ input would report high abundance and quality, the BAM input would report nothing. I looked through all the read assignments from the FASTQ results, and each one maps to the HFE gene on chr6: https://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000010704;r=6:26087226-26098343.
In your manuscript HLA files (https://github.com/mourisl/T1K_manuscript_evaluation/blob/master/hlaidx_3_44_0.tar.gz) as well as ones I made following the documentation directions, HLA-HFE receives no mapping in the coordinate files:
The reason for this lack of mapping is that the GENCODE and Ensembl GTFs just refer to this gene as
HFE
. AddGeneCoord.pl was looks for exactlyHLA-HFE
, comes up with no matches, and leaves the unmapped default.When I manually alter the coordinates file to have the HLA-HFE mapping match the Ensembl HFE coordinates, the BAM and FASTQ runs agree on the abundance/quality.
To summarize:
HFE
gene are assigned toHLA-HFE
HLA-HFE
HLA-HFE
to theHFE
gene regionHFE
gene region are not pulled during BAM extractionHLA-HFE
I don't think this is an issue for any other HLA contig. The only others that remain unmapped are HLA-DRB3, HLA-DRB4, and HLA-Y. As far as I can tell none of these has a corresponding mapping in the genome.