pritykinlab / guidescanpy

0 stars 0 forks source link

Filter gRNAs by "complexity" #38

Closed vineetbansal closed 11 months ago

vineetbansal commented 1 year ago

One possible filter for the gRNA design tool is to avoid sequences with 4 identical nucleotides adjacent to each other ("AAAA" etc.). Possibly other similar filters are useful too, after doing some investigation on the rationale behind this filter.

vineetbansal commented 1 year ago

Notes from Yuri: Any nucleotide pattern of the form NNN where N is in {'G', 'C', 'A', 'T'} should be able to be filtered out.

vineetbansal commented 1 year ago

We can do this in the most general way where we present a textbox that is interpreted as a regular expression, and if it matches the match sequence then we filter it out. However, this is prone to misuse, and also we don't want users to have to remember regular expression syntax, so let's not go that route, but have a semi-configurable option:

gRNA Pattern to avoid: <textbox>

where <textbox> can take a max of 5 characters, each one of A/C/T/G/N/V. It would then run the regular expression check, substituting N and V as appropriate, and filter out the corresponding matches.

vineetbansal commented 11 months ago

Done thanks to @IsleZhu