Degenerate Bases - Githubissues

From @Zjianglin :

ScanFold can not process degenerate bases. For example, M=A+C, K=G+T。 When I run ScanFold with a sequences containing degenerate bases as input, it raised a KeyError.

scanfold --name demo --fold --out_name demoout -w 120 -r 200 -t 37 --type di --global_refold test.fa 
/Storage/p3/test
Making output folder named:demoout
Output name=KC181923_3'UTR.win_120.stp_1.rnd_200.shfl_di
Scanning input sequence: KC181923_3'UTR
Traceback (most recent call last):
  File "/home/zhoujl/packages/ScanFold/ScanFold.py", line 549, in <module>
    scrambled_sequences = scramble(frag, randomizations, type)
  File "/home/zhoujl/packages/ScanFold/ScanFoldFunctions.py", line 875, in scramble
    result = dinuclShuffle(frag)
  File "/home/zhoujl/packages/ScanFold/ScanFoldFunctions.py", line 264, in dinuclShuffle
    ok,edgeList,nuclList,lastCh = eulerian(s)
  File "/home/zhoujl/packages/ScanFold/ScanFoldFunctions.py", line 231, in eulerian
    nuclCnt,dinuclCnt,List = computeCountAndLists(s)
  File "/home/zhoujl/packages/ScanFold/ScanFoldFunctions.py", line 188, in computeCountAndLists
    nuclCnt[y] += 1; nuclTotal  += 1
KeyError: 'K'

less test.fa | grep "K"
>KC181923_3'UTR Aedes flavivirus|Aedes_flavivirus|[10120:11079](+)
TTAGGGAGTTTGGAATACCTTTTCTATACCATAGATGCGC**K**GAAGCTTTAAAAATCGGG

There is a K in my sequence, but ScanFold failed to run. What should I do for this phenomenon? After all, many sequences have degenerate bases or N bases.

Originally posted by @Zjianglin in https://github.com/moss-lab/ScanFold/issues/16#issuecomment-900720685

moss-lab / ScanFold

Degenerate Bases #17