marbl / SALSA

SALSA: A tool to scaffold long read assemblies with Hi-C data
MIT License
182 stars 47 forks source link

If a FASTA file has “V” sequence, the following error occurred #171

Closed c2997108 closed 9 months ago

c2997108 commented 2 years ago

File "/usr/local/SALSA/get_seq.py", line 87, in revcompl = lambda x: ''.join([{'A':'T','B':'N','C':'G','G':'C','T':'A','N':'N','R':'N','M':'N','Y':'N','S':'N','W':'N','K':'N','a':'t','c':'g','g':'c','t':'a','n':'n',' ':'',}[B] for B in x][::-1]) KeyError: 'V'

skoren commented 9 months ago

You could add the missing IUPAC characters:

--- a/get_seq.py
+++ b/get_seq.py
@@ -84,7 +84,7 @@ parser.add_argument("-p","--map",help="pickle map of scaffolds (input, required)

 args = parser.parse_args()

-revcompl = lambda x: ''.join([{'A':'T','B':'N','C':'G','G':'C','T':'A','N':'N','R':'N','M':'N','Y':'N','S':'N','W':'N','K':'N','a':'t','c':'g','g':'c','t':'a','n':'n',' ':'',}[B] for B in x][::-1])
+revcompl = lambda x: ''.join([{'A':'T','B':'N','C':'G','G':'C','T':'A','N':'N','R':'N','M':'N','Y':'N','S':'N','W':'N','K':'N','V':'N','H':'N','D':'N''a':'t','c':'g','g':'c','t':'a','n':'n',' ':'',}[B] for B in x][::-1])

but they all get replaced by N so I'd recommend just replacing them all w/Ns before running SALSA.