pinellolab / CRISPResso2

Analysis of deep sequencing data for rapid and intuitive interpretation of genome editing experiments
Other
260 stars 91 forks source link

Question on large amplicons #33

Closed Zymergen-SBRUBAKER closed 4 years ago

Zymergen-SBRUBAKER commented 4 years ago

Hi, I was just wondering if there is a way to provide a larger file as opposed to a small amplicon that you normally could easily supply on the command line?

On a related note, do you know if there are any programs that are designed for QC of larger amplicons or plasmids for example, of say a few kb? Thanks.

kclem commented 4 years ago

You can provide parameters to CRISPRessoBatch in a tab-separated file, which will call CRISPResso on each batch. So if you have only one line in your batch file, it's like running CRISPResso.

However, CRISPResso is designed for analyzing short amplicon sequencing data for CRISPR edits. It sounds like you may be looking to detect genome-wide variants? Perhaps try tools from GATK e.g. HaplotypeCaller or Mutect to characterize those variants?

Zymergen-SBRUBAKER commented 4 years ago

Thank you Kendell!

I would also like to do some QC on plasmids or larger "amplicon" regions. So for example sometimes you want to QC an entire plasmid which is a few KB. It's true that we also want to do some WGS QC. And so I have some familiarity with the basics of doing SNP calling. But when it comes to validating a plasmid or a large insertion region, there is a lot of additional logic that needs to go into it, such as when to "pass" or "fail" the construct, what to warn the user about etc. This part seems very custom to me. I've made such pipelines before but never seen a paper about it.

So when I ran across your tool, this seemed very valuable to me. Besides being a great tool, it helps me to see what people want in terms of interpreting the data. It is just surprising to me that I've never been able to find a tool or paper on more traditional plasmid validation. This seems like a very common task in synthetic biology, but I've not found anything. The crispr-related amplicon qc tools seem to be the closest thing.

Is the main reason for the smaller region the use of Flash? Could you imagine a very similar pipeline that simply bypassed that step, which would work on larger sequences?

Thanks again so much for all your help and please let me know if you think of anything else I should investigate.

Take care, Shane

On Wed, Jan 8, 2020 at 2:29 PM Kendell Clement notifications@github.com wrote:

You can provide parameters to CRISPRessoBatch in a tab-separated file, which will call CRISPResso on each batch. So if you have only one line in your batch file, it's like running CRISPResso.

However, CRISPResso is designed for analyzing short amplicon sequencing data for CRISPR edits. It sounds like you may be looking to detect genome-wide variants? Perhaps try tools from GATK e.g. HaplotypeCaller or Mutect to characterize those variants?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pinellolab/CRISPResso2/issues/33?email_source=notifications&email_token=AM2Y44Q4IOHIJEBWTGXCF5TQ4ZHTBA5CNFSM4KDMX2Y2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIOG7UQ#issuecomment-572288978, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM2Y44V76B6PS3BPJKQQVF3Q4ZHTBANCNFSM4KDMX2YQ .

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com

gkurgan commented 4 years ago

Hey Shane!

Just saw this in my feed and wanted to jump into say you should try "breseq" if you want something that could work for plasmid validation with minimal development effort. It also can be easily used for strain sequencing following adaptive laboratory evolution (another pretty common workflow) and my experience has been generally very positive with the API/documentation/results. You can find it at: https://github.com/barricklab/breseq

The author also seems to be actively maintaining it as well, which is a big plus!

Zymergen-SBRUBAKER commented 4 years ago

This is great, thank you so much!

On Tue, Jan 14, 2020 at 8:12 AM gkurgan notifications@github.com wrote:

Hey Shane!

Just saw this in my feed and wanted to jump into say you should try "breseq" if you want something that could work for plasmid validation with minimal development effort. It also can be easily used for strain sequencing following adaptive laboratory evolution (another pretty common workflow) and my experience has been generally very positive with the API/documentation/results. You can find it at: https://github.com/barricklab/breseq

The author also seems to be actively maintaining it as well, which is a big plus!

  • Gavin

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/pinellolab/CRISPResso2/issues/33?email_source=notifications&email_token=AM2Y44WPYSNOQHDXFWW2UO3Q5XP5TA5CNFSM4KDMX2Y2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI5F2RI#issuecomment-574250309, or unsubscribe https://github.com/notifications/unsubscribe-auth/AM2Y44TURM3DUZNMB6HAJ2TQ5XP5TANCNFSM4KDMX2YQ .

-- Shane Brubaker Computational Biology Architect Zymergen, Inc, sbrubaker@zymergen.com