Open zhichunlizzx opened 4 months ago
The "label embedding" in Figure.1b is CTCF, RAD21 or histone modification ChIP-seq. Is the data derived from a publicly available data set or the model's previous predictions?
The "label embedding" in Figure.1b is CTCF, RAD21 or histone modification ChIP-seq. Is the data derived from a publicly available data set or the model's previous predictions?
Hello,
The label embeddings in our model are initialized as random parameters. These embeddings are designed to undergo updates during the training process. To understand this better, you can refer to the following line of code:
self.query_embed = nn.Embedding(num_class, hidden_dim)
In this context, num_class represents the total number of epigenomic features we aim to predict. The order of epigenomic features, such as CTCF, RAD21, etc., in the figure denotes their respective indices within the embedding list.
I hope this answers your questions.
Thanks for your reply, now I understand a lot
HI,how the ROC curve and AUC of two bigwig signals in this paper are evaluated? Whether to take the two bigwig signals directly as input to sklearn.metrics.roc_curve?
HI,how the ROC curve and AUC of two bigwig signals in this paper are evaluated? Whether to take the two bigwig signals directly as input to sklearn.metrics.roc_curve?
Sorry, I didn’t fully understand your question. Which figure in the manuscript are you talking about? If your are evaluating the ability of predicted signals to capture ChIP-seq peaks, we actually use the predicted signals and binary peak data as inputs.
like Figure 5A, B, C, E, F
I have a question about the evaluation metric "mse1imp" used in Figure 2B: the description of "mse1imp" says that the top 1% position of the predicted data is evaluated. Were EPCOT or Avocado predictions used in determining these positions?
like Figure 5A, B, C, E, F
In the enhancer activity prediction task, we predict the binary STARR-seq peaks instead of the signals, so the model outputs the probability indicating the likelihood of a peak.
I have a question about the evaluation metric "mse1imp" used in Figure 2B: the description of "mse1imp" says that the top 1% position of the predicted data is evaluated. Were EPCOT or Avocado predictions used in determining these positions?
The genomic positions used in the 'mse1imp' evaluation metric, are determined by the predicted signals. This metric is defined in the paper 'https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-02915-y'.
mse1imp Thanks for your reply, I found the calculation method of this evaluation index.
Hello, after reading your article I still have some questions about what "label embedding" is. The introduction of "label embedding" in the thesis is not much. Is it another kind of sequencing data besides DNase-seq or ATAC-seq?