maximilianh / crisporWebsite

All source code of the crispor.org website
http://crispor.org
Other
68 stars 43 forks source link

certain genomic target sequences not found in genome and crash lindel calculation #35

Closed genya closed 4 years ago

genya commented 4 years ago

Found that rarely certain genomic ranges crash the command line version (error output pasted at bottom); while on the website, inputting these sequences produces "Query sequence, not found in the selected genome, Homo sapiens (hg38)" output, even though inputting the genomic coordinates pulls up the target sequence.

>ENSG00000286185 AC242842.3 exon ENST00000621744.4_5 range=chr1:149482155-149482246 strand=+ ttctctgaatttatttacagAAAATGAAAGTGATGATGAGGAAGAGGAAGAAAAAGGGCCAGTGTCTCCCAGgtaatgttgtggaattgttg >ENSG00000261832 AC138894.1 exon ENST00000637378.1_8 range=chr16:28458366-28458441 strand=- tcatgtgttggctttttcagATCCCCCCTTCTGCAAGAAAGCCTCTTTGCAACTGGgtaagtttgtttgttttcct >ENSG00000278662 GOLGA6L10 exon ENST00000610657.1_3 range=chr15:82346541-82346661 strand=- tctctctgcatgcacctcagAGCCAGTACCAAGAACTAGCAGTGGCCCTGGATTCAAGCTCCGCAATAATCAGTCAACTCACTGAAAACATCAATTCACTGgtaagagtccagtggggtcc >ENSG00000261247 GOLGA8T exon ENST00000569052.1_5 range=chr15:30139339-30139426 strand=+ ctgtcttcctcttcctacagGAAAAGAAAGCAAACAACAAGAAACAGAAAGCCAAAAGGGTGCTAGAGgtgagtggagggtgtgcagt >ENSG00000204172 AGAP9 exon ENST00000452145.6_1 range=chr10:47522846-47522954 strand=- ttctccctctatacatatagCTTTGGAGTTTAACCTTTCTGCCAATCCAGAGGCAAGCACAATATTCCAGAGGAACTCTCAAACAGATGgtgagacaacagtgtctgta >ENSG00000125498 KIR2DL1 exon ENST00000336077.11_0 range=chr19:54769831-54769904 strand=+ ctgtctgctccggcagcaccATGTCGCTCTTGGTCGTCAGCATGGCGTGTGTTGgtgagtcctggaaagcaata >ENSG00000240038 AMY2B exon ENST00000361355.8_0,ENST00000610648.1_0 range=chr1:103571583-103571790 strand=+ actgacaacttcaaagcaaaATGAAGTTCTTTCTGTTGCTTTTCACCATTGGGTTCTGCTGGGCTCAGTATTCCCCAAATACACAACAAGGACGGACATCTATTGTTCATCTGTTTGAATGGCGATGGGTTGATATTGCTCTTGAATGTGAGCGATATTTAGCTCCCAAGGGATTTGGAGGGGTTCAGgtgggtatgattcatagtat >ENSG00000263956 NBPF11 exon ENST00000615281.4_10 range=chr1:148114417-148114508 strand=- ttctctgaatttatttacagAAAATGAAAGTGATGATGAGGAAGAGGAAGAAAAAGGGCCAGTGTCTCCCAGgtaatgttgtggaattgttg >ENSG00000269713 NBPF9 exon ENST00000615421.4_13,ENST00000584027.8_13 range=chr1:149063613-149063825 strand=- actttttcccacttttccagGCTCAGCAGGGAGCTGCTGGATGAGAAAGGGCCTGAAGTCTTGCAGGACTCACTGGATAGATGTTATTCAACTCCTTCAGGTTGTCTTGAACTGACTGACTCATGCCAGCCCTACAGAAGTGCCTTTTACGTATTGGAGCAACAGCGTGTTGGCTTGGCTGTTGACATGGATGgtgagtacctttctatgaag >ENSG00000260691 ANKRD20A1 exon ENST00000562196.5_9 range=chr9:67887256-67887324 strand=+ atatcccctttgctttgtagGGCCTCCTGCAAAACATCCTTCCTTGAAGgtaattaattatgtatattt

INFO:root: running on sequence 'ENSG00000286185 AC242842.3 exon ENST00000621744.4_5 range=chr1:149482155-149482246 strand=+', guideLen=20, seqLen=92 INFO:root:Progress sTFqCkJ9RFySD1swZXbw - bwasw - Searching genome for one 100% identical match to input sequence [M::bwa_idx_load_from_disk] read 0 ALT contigs [bsw2_aln] read 1 sequences/pairs (92 bp) ... [main] Version: 0.7.15-r1140 [main] CMD: /lab/solexa_sabatini/genya/crisporWebsite6/bin/Linux/bwa bwasw -T 20 /lab/solexa_sabatini/genya/crisporWebsite6/genomes/hg38/hg38.fa /tmp/crisporBestMatchttMxuE.fa [main] Real time: 5.965 sec; CPU: 5.857 sec INFO:root:Progress sTFqCkJ9RFySD1swZXbw - effScores - Calculating guide efficiency scores INFO:root:Progress sTFqCkJ9RFySD1swZXbw - outcome - Calculating editing outcomes Traceback (most recent call last): File "crispor.py", line 8304, in main() File "crispor.py", line 8301, in main mainCommandLine() File "crispor.py", line 8110, in mainCommandLine getOfftargets(seq, org, pamPat, batchId, startDict, ConsQueue()) File "crispor.py", line 4300, in getOfftargets processSubmission(faFname, org, pamDesc, otBedFname, batchBase, batchId, queue) File "crispor.py", line 3845, in processSubmission createBatchEffScoreTable(batchId, queue) File "crispor.py", line 3461, in createBatchEffScoreTable guideRows = calcSaveEffScores(batchId, seq, extSeq, pam, queue) File "crispor.py", line 3402, in calcSaveEffScores mutScores = crisporEffScores.calcMutSeqs(pamIds, longSeqs, enz, scoreNames=mutScoreNames) File "/lab/solexa_sabatini/genya/crisporWebsite6/crisporEffScores.py", line 1311, in calcMutSeqs mutSeqDict = calcLindelScore(seqIds, seqs) File "/lab/solexa_sabatini/genya/crisporWebsite6/crisporEffScores.py", line 748, in calcLindelScore return runLindel(seqIds, trimSeqs(seqs, -33, 27)) File "/lab/solexa_sabatini/genya/crisporWebsite6/crisporEffScores.py", line 724, in runLindel y_hat, fs = Lindel.Predictor.gen_prediction(seq,weights,prerequesites) ValueError: too many values to unpack Error in atexit._run_exitfuncs: Traceback (most recent call last): File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs func(targs, **kargs) File "crispor.py", line 7868, in delBatchDir raise Exception("cowardly refusing to remove many temp files") Exception: cowardly refusing to remove many temp files