Closed tseemann closed 8 years ago
I ran this command to check the database:
cd-hit-est -d 0 -i EcOH.fasta -c 1 -g 1 -o cdhit
and discovered 47 duplicate sequences (and some subsequences), including some which have confounding O/H types - here are some examples:
0 1251nt, >9__wzy__wzy-O123-Gp5__370... * 1 1251nt, >9__wzy__wzy-O123-Gp5__371... at +/100.00% 2 1251nt, >9__wzy__wzy-O123-Gp5__372... at +/100.00% 3 1251nt, >9__wzy__wzy-O186-Gp5__454... at +/100.00% 0 1230nt, >8__wzx__wzx-O153-Gp11__191... * 1 1230nt, >8__wzx__wzx-O178-Gp11__223... at +/100.00% 0 1071nt, >8__wzx__wzx-O28ac-Gp2__250... at +/100.00% 1 1071nt, >8__wzx__wzx-O42-Gp2__266... at +/100.00% 2 1230nt, >8__wzx__wzx-O42-Gp2__267... * 0 1221nt, >8__wzx__wzx-O118-Gp3__140... * 1 1221nt, >8__wzx__wzx-O118-Gp3__141... at +/100.00% 2 1221nt, >8__wzx__wzx-O151-Gp3__189... at +/100.00% 0 1149nt, >9__wzy__wzy-O129-Gp10__383... * 1 1149nt, >9__wzy__wzy-O13-Gp10__384... at +/100.00% 2 1149nt, >9__wzy__wzy-O135-Gp10__390... at +/100.00% 3 1149nt, >9__wzy__wzy-O135-Gp10__391... at +/100.00% 0 1053nt, >9__wzy__wzy-O111__351... * 1 1053nt, >9__wzy__wzy-O7__524... at +/100.00% 0 1074nt, >9__wzy__wzy-O153-Gp11__412... at +/100.00% 1 1098nt, >9__wzy__wzy-O178-Gp11__444... * 0 1380nt, >9__wzy__wzy-O28ac-Gp2__471... * 1 1380nt, >9__wzy__wzy-O42-Gp2__487... at +/100.00% 2 1380nt, >9__wzy__wzy-O42-Gp2__488... at +/100.00% 0 1332nt, >9__wzy__wzy-O17-Gp9__434... * 1 1332nt, >9__wzy__wzy-O44-Gp9__490... at +/100.00% 2 1332nt, >9__wzy__wzy-O77-Gp9__531... at +/100.00% 0 1191nt, >9__wzy__wzy-O18-Gp12__446... * 1 1146nt, >9__wzy__wzy-O18ac__457... at +/100.00%
Done! On the master branch, anyway... a new release with the fix included should be coming soon.
I ran this command to check the database:
and discovered 47 duplicate sequences (and some subsequences), including some which have confounding O/H types - here are some examples: