pengxingang / TEIM

TEIM: TCR-Epitope Interaction Modeling
MIT License
42 stars 5 forks source link

How to generate new motifs by TEIM-Seq #3

Closed JiaweiZhang1997 closed 1 year ago

JiaweiZhang1997 commented 1 year ago

Hi, I am curious about how to generate new motifs by TEIM-Seq but I haven't gotten the answer from the paper. It's there a threshold that split the motifs and background amino acids? If so, how to get the threshold?

I am looking forward to your reply!

pengxingang commented 1 year ago

Hi, thanks for your interest in our work. We briefly introduced how we used TEIM-Seq to generate new motifs in the Applications of TEIM-Seq subsection of the Methods section:

We then generated new motifs for the epitope. In particular, we first randomly started from an initial CDR3β sequence from the background PWM and then used simulated annealing to improve the binding scores. More specifically, in each iteration, we randomly mutated a position of a CDR3β sequence and predicted its binding score with the epitope. If the predicted binding score was higher after mutation, we accepted the sequence otherwise we accepted it with a certain probability. After 10,000 iterations, all CDR3s with binding scores greater than 0.9 were used to calculate new motifs.

We now provided a pseudocode for the generation algorithm:


function generateNewMotifs(backgroundPWM, epitope)
    motifs <- empty list

        // initial sequences
        randomCDR3 <- randomly select a CDR3β sequence from backgroundPWM
    currentScore <- calculateBindingScore(randomCDR3, epitope)

    repeat t = 1, 2, ..., 10 000
                mutatedCDR3 <- mutate(randomCDR3)  // Mutate a position in the sequence
        mutatedScore <- calculateBindingScore(mutatedCDR3, epitope)

        if mutatedScore > currentScore  // Accept if mutatant has a higher affinity
                randomCDR3 <- mutatedCDR3
            currentScore <- mutatedScore
        else  // Randomly accept 
                acceptanceProbability <- calculateAcceptanceProbability(mutatedScore, currentScore, t)
            if acceptanceProbability > random(0, 1)
                    randomCDR3 <- mutatedCDR3
                currentScore <- mutatedScore
            end if
        end if

        if currentScore > 0.9  // Regard as a binding sequence
              motifs.append(randomCDR3)
        end if
    end repeat

    motifPWM <- averageAllSequences(motifs)
    return motifPWM
end function

As you can see, we chose 0.9 as a threshold for binding.

JiaweiZhang1997 commented 1 year ago

Thank you for your reply! And it's an interesting method! I have one more question, after getting the motifPWM, we still need an algorithm like gliph to calculate to get the motif, is this understanding correct?

pengxingang commented 1 year ago

You can utilize motifPWM directly as the binding motifs. In this context, the term "motif" refers to the binding pattern found in the entire sequences, similar to the motifs described in VDJdb (https://vdjdb.cdr3.net/motif). The motifs defined in GLIPH, on the other hand, primarily refer to the binding k-mer sequences. Basically, the algorithm mentioned above focuses on identifying a set of binding sequences. Once you have obtained this set, you can calculate the PWM as motifs directly or use GLIPH to detect motif k-mers, depending on your specific application requirements.

JiaweiZhang1997 commented 1 year ago

Thank you very much! ^_^ Now I have a better understanding of this method.