Closed yangzhao1230 closed 8 months ago
Sorry for the late reply, @yangzhao1230
Regarding embedding extraction. We pulled out layers 12, 16, 21, 24, and 32. For the results shown in the figures, we used the layer that resulted in the highest performance for each score separately.
Regarding similarity calculation. We used the embeddings from the token containing the mutation.
Regarding the similarity threshold. Let me know if this is what you're referring to. But for our ROC analyses we used the scores as is. We didn't use a cutoff to classify the variants.
Hope this helps.
I am particularly intrigued by the experiments outlined in section A.5.4, which focuses on Functional Variant Prioritization.
I am particularly intrigued by the experiments outlined in section A.5.4, which focuses on Functional Variant Prioritization. As I attempt to replicate this specific experiment, I have encountered some challenges and would greatly appreciate additional details to aid in my efforts. Specifically, I am interested in the following aspects:
Could you please clarify from which layer of the Transformer the embeddings are extracted?
In the calculation of similarity, is it based solely on the embeddings of tokens that have undergone mutations, or does it encompass the similarity of embeddings for the entire sequence?
What threshold value is employed for binary similarity in the two-class classification? Understanding this threshold is crucial for my replication efforts.
I have observed that the similarity between sequences with severe mutations tends to be exceptionally high (exceeding 0.999). To gain a deeper understanding and enhance the reproducibility of this experiment, I would be grateful for any additional insights or details you could provide.