micknudsen / gatk2ascat

Convert GATK output to input for ASCAT
MIT License
1 stars 0 forks source link

Code modications #1

Open EKocakavuk opened 3 months ago

EKocakavuk commented 3 months ago

Hi @micknudsen,

thanks for providing your code for the transformation of GATK outputs as inputs for ASCAT. I am trying to adapt this to our current pipeline in which we analyze WXS/WGS data with trios (normal sample + two timely separated tumor samples).

I was encountering some errors when running your script, specifically related to

class BAF(NamedTuple):
    chromosome: str
    position: int
    ref_count: int
    alt_count: int
    ref_nucleotide: str
    alt_nucleotide: str

    @property
    def frequency(self):
        return self.alt_count / (self.ref_count + self.alt_count)

Since the allelic counts files for tumor and normal include positions where alt_count and ref_count are zero, it gave a non-dividable by zero error, which I attempted to fix with:

    def frequency(self):
        if (self.alt_count + self.ref_count) == 0:
            return 0
        else: 
            return self.alt_count / (self.ref_count + self.alt_count)

Another point is related to the following part:

class Segmentation:

    def __init__(self, segments: Iterable[Segment]) -> None:
        self._segments: DefaultDict[str, List[Segment]] = defaultdict(list)
        for segment in segments:
            self._segments[segment.chromosome].append(segment)

    def logr(self, chromosome: str, position: int) -> float:
        for segment in self._segments[chromosome]:
            if segment.start <= position <= segment.end:
                return segment.logr
        raise UncoveredPositionError

-> Since not all positions of the allelic count files are included in the denoisedCR file, which throws the error UncoveredPositionError

I attempted to fix this is as following:

    def logr(self, chromosome: str, position: int) -> float:
        for segment in self._segments[chromosome]:
            if segment.start <= position <= segment.end:
                return segment.logr
            else:
                print("Skipping non-matching position...")

Please let me know your comments. I am not sure what specific down-stream effects these modifications might have and your feedback is appreciated.

Best, Emre

micknudsen commented 3 months ago

Hi Emre,

Thanks for your feedback. I wrote this piece of code several years ago, when I experimented with ASCAT for WES samples, and I must admit that I have forgot about the details.

Back then, I used it on many samples, and it never failed. Maybe I did some pre-filtering of sites before running gatk2ascat, but I think that I just followed a tutorial on the GATK webpage.

I'm sorry that I can't help here, but I suggest that you go ahead with your changes and see if the final output makes sense. Personally, I was never really happy with ASCAT, and these days I'm working almost exclusively with WGS, where I've found Sequenza to be a much better alternative.

Best, Michael