hartwigmedical / hmftools

Various algorithms for analysing genomics data
GNU General Public License v3.0
179 stars 56 forks source link

Some questions about PON files #511

Closed xuxingyubio closed 4 months ago

xuxingyubio commented 5 months ago

I have a question for the Lilac software team:

I am not entirely sure, but could the issue I am experiencing be due to the relatively few variations in the MHC region of the downloaded GermlineHetPon.38.vcf.gz file? This seems to result in the entire MHC region being classified into the same area during copy number calculation, causing the overall copy numbers to be identical. Consequently, the calculated copy numbers for the A, B, C genes are all the same. Is there a more comprehensive PON file available specifically for the MHC region?

p-priestley commented 5 months ago

Hello.

I am not too sure about your setup, but the copy number is estimated by PURPLE by combining and segmenting SV and copy number information genome wide including the HLA region. LILAC counts the support for each allele and assigns it to the most likely fitted copy number. This is described here: https://github.com/hartwigmedical/hmftools/tree/master/lilac#tumor-allele-specific-copy-number

I don't know about how strong the resolution is for our BAF points in the MHC region, but I also think that PURPLE is unlikely to be very sensitive to it since we also search for copy number changes and SV events in the region.

The 3 MHC I genes are alll close together so it is not unexpected that they all have the same copy number. We find that only in a small number of samples that they have different copy numbers. The only way they can have distinct copy numbers if there is a structural variant in the region between the genes

Peter