It is important to consider genome accessibility when computing rates from genomic data.
scikit-allel has options to include an "accessibility mask", a boolean array indicating whether a base is accessible or not, and can be used to properly normalize quantities.
I found mentions of implementing this in #341
I am happy to help make this happen, but since I am new to the codebase I'd need some hand-helding... Ideally we would need a way of reading BED files which can be attached to the genotype dataset. Then, when computing per base statistics, we would need to intersect the accessible intervals with the windows intervals to get the right denominator.
It is important to consider genome accessibility when computing rates from genomic data.
scikit-allel has options to include an "accessibility mask", a boolean array indicating whether a base is accessible or not, and can be used to properly normalize quantities.
I found mentions of implementing this in #341
I am happy to help make this happen, but since I am new to the codebase I'd need some hand-helding... Ideally we would need a way of reading BED files which can be attached to the genotype dataset. Then, when computing per base statistics, we would need to intersect the accessible intervals with the windows intervals to get the right denominator.