hkimlab / DeepPrime

Source codes and examples for DeepPrime
11 stars 3 forks source link

Missing Columns gN19 and sPBS_RTSeq in Command Line Output #7

Closed UronicAcid closed 2 months ago

UronicAcid commented 2 months ago

Hi,

I've been using both the web version and the command line version of your tool. However, I cannot find the gN19 and sPBS_RTSeq columns in the output from the command line version.

Is there any specific configuration or step that I might be missing to include these columns in the command line output?

Thank you!

python DeepPrime.py -f my_input.csv \
--cell_type HCT116 -p PE2 \
--pbs_min 13 \
--pbs_max 13
Goosang-Yu commented 2 months ago

Hi!

The DeepPrime output in this repository doesn't provide those specific columns separately.

If you want to find the spacer sequence, you can use the WT74 sequence [4:24]. If you’re looking for the RT-PBS sequence, just remove the 'x' masking from Edited74_On.

However, if you’re not looking to find a large number of pegRNAs, I recommend using the Python package genet.

You can install it with:

pip install genet

Hope this helps!

UronicAcid commented 2 months ago

Thank you for your reply. We’re looking to identify a large number of pegRNAs. I checked the output from the web version. I found that pegRNA1.19 from the web output matches the pattern you mentioned, but pegRNA1.1 does not. Specifically, I cannot find the gN19 column in WT74_On.

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

ID | WT74_On | Edited74_On | gN19 | sPBS_RTSeq | EditedwNote | AltKey -- | -- | -- | -- | -- | -- | -- pegRNA1.19 | GGTAGCTGGTCACGGTAAAGAAGCCGGTGATGACAGCCAGCTCCGAGCAGGGGATGCAGTCCTTGCTGGCGATG | xxxxxxxxGTCACGGTAAAGAAACCGGTGATGACAGCCxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | GCTGGTCACGGTAAAGAAGC | GTCACGGTAAAGAAACCGGTGATGACAGCC | GTCACGGTAAAGAA(G/A)CCGGTGATGACAGCC | sub1 pegRNA1.1 | TGCTCAGGCTCAGGTAGCTGGTCACGGTAAAGAAGCCGGTGATGACAGCCAGCTCCGAGCAGGGGATGCAGTCC | xxxxxxxxCTCAGGTAGCTGGTCACGGTAAAGAAACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx | GAGGCTCAGGTAGCTGGTCA | CTCAGGTAGCTGGTCACGGTAAAGAAAC | CTCAGGTAGCTGGTCACGGTAAAGAA(G/A)C | sub1
Goosang-Yu commented 2 months ago

Oh that's becuase we change 1st position of spacer to G. That's why we named it gN19 ('G' + N x 19 for guideRNA) We set this for transcription using U6 promoter which efficiently express RNA transcript when it starts with 'G'

UronicAcid commented 2 months ago

Is the prediction based on gN19 rather than the original spacer? If I change the first nucleotide of the spacer to a G, would that affect the target identification process?

Goosang-Yu commented 2 months ago

Yes, the dataset for DeepPrime used pegRNA starts with G. If an original target protospacer not start with G, we changed.

The first position of spacer is known as it not much affect to it's guiding functionality. You can find these features from other relative papers.

UronicAcid commented 2 months ago

Thanks again for your help