thackl / gggenomes

A grammar of graphics for comparative genomics
https://thackl.github.io/gggenomes/
Other
605 stars 65 forks source link

read_seqs for gbk splits LOCUS lines wrong #203

Open thackl opened 1 month ago

thackl commented 1 month ago

When parsing gbk files for sequence length, the "LOCUS" line is split not just on white space, but any non-alphanumeric character. I.e. "LOCUS scaffold_20 50000 bp ..." gives seq_id="scaffold", length=20...

https://github.com/thackl/gggenomes/blob/976bb831975b505964086a19dd5371163abec991/R/read_seqs.R#L94