Open daniel-wells opened 5 months ago
If my understanding is correct, I don't think a negative threshold could ever make sense, since what's being considered is the cumulative GERP score. A variant at the very last position will never have a sufficiently negative GERP score to be considered "END_TRUNC". BRCA2 which you show is a good example. In gnomAD v4 (GRCh38), the last two LoF variants are not considered END_TRUNC, but as you move further from the end of the transcript we see some END_TRUNC variants. https://gnomad.broadinstitute.org/gene/ENSG00000139618?dataset=gnomad_r4
In the GRCh37 branch the GERP cutoff is +180 but in the GRCh38 version it was changed to -58: https://github.com/konradjk/loftee/commit/681c5072992fb8c0b6a12c3f78ed55b95fb7e125
This means that variants that are annotated as LC END_TRUNC in LOFTEE GRCh37 change to being HC LoF in GRCh38, even if they're very close to the end of the protein.
An example variant in GRCh37 that's at the very end of the protein gets annotated as LC END_TRUNC (seems sensible):
But the same variant in GRCh38 is annotated as HC LoF:
I can't find exactly how either of the two GERP cutoff values were derived. The GERP scores from the two versions of the files don't look that drastically different.
Is -58 really the correct value for
gerp_end_trunc_cutoff
in GRCh38? How was it derived?