schmidt73 / encode_pipeline

0 stars 0 forks source link

Requesting hg38 and mm10 databases (no alt chromosomes) #1

Closed liboxun closed 1 year ago

liboxun commented 1 year ago

Hi! I was pointed towards here through the ENCODE preprint "Multi-center integrated analysis of non-coding CRISPR screens".

I'm interested in the hg38 and mm10 BAM guide databases, as described in the supplementary method section 3, "Design of sgRNA libraries targeting all ENCODE SCREEN cCREs":

To generate sgRNA libraries targeting all human and mouse ENCODE SCREEN v4 cCREs, agnostic of cell type, we first constructed genome-wide GuideScan2 databases for the most recent hg38 and mm10 patches, excluding alternative chromosomes in our analysis. This resulted in two BAM databases containing off-target information, cutting efficiency and specificity scores. The hg38 BAM database is 159 GB in size with 656 million sgRNAs. The mm10 BAM database is 97 GB in size with 116 million sgRNAs.

The reason is that I have some promoter-distal genomic regions that I'd like to design gRNAs against, yet ~10% of them are located in places where there are alternative chromosomal segments for. As a result the Guidescan2 webtool doesn't work (finds 0 guide).

The databases described above in the preprint seem precisely the workaround. Would it be possible to share them somehow?

I'd also recommend replacing the guidescan2 webtool's hg38 and mm10 databases with these since I can hardly think of any use cases where users want to design against a fasta that contains alt chromosomes.

Thanks very much!

schmidt73 commented 1 year ago

You can now download these databases here: https://guidescan.com/downloads

The indices will be added soon. I hope this is helpful!

liboxun commented 1 year ago

Perfect. Thanks so much for your prompt response!

liboxun commented 1 year ago

@schmidt73 To clarify, the predefined tags cs = specificity score, and ds = cutting efficiency score?

Thanks!