What exactly is meant by editing length?

marcus-r-kelly commented 1 year ago

I am attempting to use DeepPrime to plan pegRNA designs. I notice that your training libraries (in Table S1 of your publication) contain pegRNAs with "Editing Length" 3 (for example "Group12_112995" in Table S1, "Lib-Profiling") where not all substitions are made next to each other (for example, this sequence with the substitutions in lowercase: GCGGggAGCaGCCG )

However, it seems that the webtool and distributed implementations of DeepPrime constrain the user to designing substitutions where those edits are within 3bp of each other. Is this by design?

As I understand it, these sequences ought to be a valid submission:

Query:   1 TCTACAACCCCACCACGTACCAGATGGATGTGAACCCCGAGGGCAAATACAGCTTTGGTGCCACCAACCCCACCACGTACCAGATGGATGTGAACCCCGAGGGCAAATACAGCTTTGGTGC 121
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||..||.||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Ref  :   1 TCTACAACCCCACCACGTACCAGATGGATGTGAACCCCGAGGGCAAATACAGCTTTGGTGTAACGAACCCCACCACGTACCAGATGGATGTGAACCCCGAGGGCAAATACAGCTTTGGTGC 121

Score: 233
Matches: 118 (97.5%)
Mismatches: 3

But if I align the WT74_On and Edited74_On columns from the output of prd.pe_score :

Query:  1 CGTACCAGATGGATGTGAACCCCGAGGGCAAATACAGCTTTGGTGTAACCAACCCCACCACGTACCAGATGGAT 74
          |||||||||||||||||||||||||||||||||||||||||||||..|||||||||||||||||||||||||||
Ref  :  1 CGTACCAGATGGATGTGAACCCCGAGGGCAAATACAGCTTTGGTGCCACCAACCCCACCACGTACCAGATGGAT 74

Score: 286
Matches: 72 (97.3%)
Mismatches: 2
CIGAR: 74M

The final mismatch is simply not present.

Goosang-Yu commented 1 year ago

Thank you Marcus,

Lib-Profile was not used for DeepPrime training because there are too minor edit types (edit length 4-30nt)
Can you upload your DeepPrime output files and let me know your input information (WT seq, Edited seq, and other parameters)

Goosang

marcus-r-kelly commented 1 year ago

Thank you for replying so quickly. The issue should be reproduced with df_pe = prd.pe_score('TCTACAACCCCACCACGTACCAGATGGATGTGAACCCCGAGGGCAAATACAGCTTTGGTGCCACCAACCCCACCACGTACCAGATGGATGTGAACCCCGAGGGCAAATACAGCTTTGGTGC' , 'CGTACCAGATGGATGTGAACCCCGAGGGCAAATACAGCTTTGGTGCCACCAACCCCACCACGTACCAGATGGAT', 'sub3')

Following directions on this repository.

I note that the ClinVar library also contains constructs with 3 substitutions spanning more than 3 nt.

Also, is there somewhere where the editing efficiencies of Lib-Profile constructs can be found?

Goosang-Yu commented 1 year ago

Thank you for sharing your input information. I checked your input, and as I understand, your input (WT / Edited seq) seems like below. input

Genet predict.pe_score needs 121-nt WT / Edited sequence both. so, your 2nd positional input ('CGTACCAGATGGATGTGAACCCCGAGGGCAAATACAGCTTTGGTGCCACCAACCCCACCACGTACCAGATGGAT') is not acceptable format.
Maybe your intended prime editing is CCACC -> TAACG, which is not continuous mutations. Yes, DeepPrime can return predictive scores for this input, but it's unreliable becuase DeepPrime's training data only containing continuous editing. For example, DeepPrime trained prime editing CCACC -> TATCC.
For your case, here is my recommandation.
- Use DeepPrime with 2-nt editing. (CCACC -> TAACC)
- Get top scored pegRNAs and make additional mutation manually in pegRNA 3' extension
- If RHA is too short becuase of additional edit, extend RTT lengths as the length of decreased RHA length. At lease 5-6nt RHA length is required for prime editing.
- Test 3-5 top pegRNAs like this manner.

I hope this will be helpful to you.

Goosang

Goosang-Yu commented 1 year ago

+) Library-Profiling dataset: Please see this issue #1

marcus-r-kelly commented 1 year ago

My apologies, I seem to have pasted in my input wrong above. In the actual run, I did use 121nt sequences. The rest of your interpretation is correct.

I will consider your advice-- it may well be the best way forward.

I would suggest that the webtool be altered to call attention to out-of-bounds edits for other users.

Thank you for your help!

hkimlab / DeepPrime

What exactly is meant by editing length? #5