kundajelab / chrombpnet

Bias factorized, base-resolution deep learning models of chromatin accessibility (chromBPNet)
https://github.com/kundajelab/chrombpnet/wiki
MIT License
118 stars 33 forks source link

Question about outputs of Variant effect prediction #152

Closed jasondanic closed 11 months ago

jasondanic commented 11 months ago

Hi,

I got several questions about understanding the results when using variant_scoring.py from https://github.com/kundajelab/variant-scorer. The main question arises from the logfc.mean, which is composed of both negative and positive values. Given that these results differ between alternate allele and reference allele predictions, can I interpret positive values as indicating up-regulation in accessibility caused by the mutation, and vice versa? This also raises another question regarding the input of SNP data: does the SNP data need to be within the scATAC-seq peak regions of a specific cell type or just all of it? Based on the method described in this articlehttps://doi.org/10.1016/j.cell.2022.11.028, it appears that thresholding the mutation impact scores involves selecting mutations from peak regions (as shown in the screenshot below). image

Thanks.

panushri25 commented 11 months ago

Hello,

Yes the sign indicates up-regulation or down-regulation with respect to the ref allele.

I am not aware of the source you are pointing to. Maybe you can share the url directly? SNP data need not be in the peak necessarily, but you can imagine that a log fold change of x more reliable in peaks than in background. The max percentile column in variant-scorer captures this idea. I recommend posting in the variant scorer package GitHub for further clarification - - https://github.com/kundajelab/variant-scorer.

Thank you, Anu