Closed toddajohnson closed 1 year ago
Hi Todd - sorry for the really delayed response. To quickly answer your questions:
The mappability is for information purposes only
To construct the mappability bed file, we assessed mappability based on 2 errors in a 150 base region centred on the location.
Since I started to use HMF's programs, I have been annotating the variants with the mappability score, but, stupid me, I just realized that I never actually used it anywhere for downstream filtering. I have searched through the SAGE/PAVE/PURPLE README files, but I do not see any suggestion as to downstream filtering using the MAPPABILITY score in the VCF files. Also, what is the source of the mappability scores, and is it filtered for just genic regions? I ask because In checking through areas around some germline variants that I just had called and annotated but that lacked MAPPABILITY annotation, it seems that some variants are not in regions present in the bed file, even though they show up as in highly mappable regions in the UCSC genome browser. For instance, chr2:150,220,696-150,220,696 has UMAP K100 probability of 1 in UCSC, but there are no annotation entries within about 5k below and 20 k above that position in the bed. Also, just spot checking some nearby entries, and the data seems to differ. chr2:150240156 has score = 0.0145 in the bed, but probability of 1 in the k100 tract.