Suffix Tree Algorithm implemented in Python, might be the most complete version online, even more complete than that demonstrated on stackoverflow.
I underestimated the complication of the algorithm and just wanted to have some fun. A primitive implementation was done in a couple of hours, and the demonstation example on stackoverflow works just fine. Then when I wanted try some more complicated examples, I kept hitting the wall time and time again. It annoyed me, and thus costed me several days to try different situations when constructing a suffix tree.
Finally, the version comes out, I think all the situations explained in the questions and answers have been experienced and covered in the algorithm above before I read the full post.
I also write a blog on explaining the implementation details on my blogger MuTuX with flowcharts and explanation on it.
docs = ['abcabxabcd', 'dedododeeodoeodooedeeododooodoede$', 'ooooooooo', 'mississippi']
for text in docs:
tree, pst = build(text, regularize=True)
Node.draw(tree, pst, ed='#')
The running results:
abcabxabcd
● (0)
|
| ab
+----------------● (4->6)
| |
| | xabcd
| +---------------● (5)
| |
| | c
| +---------------● (9->11)
| |
| | abxabcd
| +---------------● (1)
| |
| | d
| +---------------● (10)
|
| c
+----------------● (13)
| |
| | abxabcd
| +---------------● (3)
| |
| | d
| +---------------● (14)
|
| b
+----------------● (6)
| |
| | xabcd
| +---------------● (7)
| |
| | c
| +---------------● (11->13)
| |
| | abxabcd
| +---------------● (2)
| |
| | d
| +---------------● (12)
|
| xabcd
+----------------● (8)
|
| d
+----------------● (15)
dedododeeodoeodooedeeododooodoede$
● (0)
|
| e
+----------------------------------------● (28)
| |
| | $
| +---------------------------------------● (71)
| |
| | eodo
| +---------------------------------------● (48->37)
| |
| | eodooedeeododooodoede$
| +---------------------------------------● (29)
| |
| | dooodoede$
| +---------------------------------------● (49)
| |
| | d
| +---------------------------------------● (44->19)
| |
| | ododeeodoeodooedeeododooodoede$
| +---------------------------------------● (18)
| |
| | e
| +---------------------------------------● (68->26)
| |
| | eododooodoede$
| +---------------------------------------● (45)
| |
| | $
| +---------------------------------------● (69)
| |
| | odo
| +---------------------------------------● (37->31)
| |
| | eodooedeeododooodoede$
| +---------------------------------------● (30)
| |
| | dooodoede$
| +---------------------------------------● (50)
| |
| | oedeeododooodoede$
| +---------------------------------------● (38)
|
| d
+----------------------------------------● (19)
| |
| | e
| +---------------------------------------● (26->28)
| |
| | dododeeodoeodooedeeododooodoede$
| +---------------------------------------● (17)
| |
| | $
| +---------------------------------------● (70)
| |
| | eodo
| +---------------------------------------● (46->48)
| |
| | eodooedeeododooodoede$
| +---------------------------------------● (27)
| |
| | dooodoede$
| +---------------------------------------● (47)
| |
| | o
| +---------------------------------------● (33->35)
| |
| | e
| +---------------------------------------● (64->42)
| |
| | de$
| +---------------------------------------● (65)
| |
| | odooedeeododooodoede$
| +---------------------------------------● (34)
| |
| | d
| +---------------------------------------● (22->24)
| |
| | eeodoeodooedeeododooodoede$
| +---------------------------------------● (23)
| |
| | o
| +---------------------------------------● (53->31)
| |
| | deeodoeodooedeeododooodoede$
| +---------------------------------------● (20)
| |
| | oodoede$
| +---------------------------------------● (54)
| |
| | o
| +---------------------------------------● (57->59)
| |
| | edeeododooodoede$
| +---------------------------------------● (40)
| |
| | odoede$
| +---------------------------------------● (58)
|
| o
+----------------------------------------● (35)
| |
| | e
| +---------------------------------------● (42->28)
| |
| | odooedeeododooodoede$
| +---------------------------------------● (36)
| |
| | de
| +---------------------------------------● (66->68)
| |
| | eododooodoede$
| +---------------------------------------● (43)
| |
| | $
| +---------------------------------------● (67)
| |
| | d
| +---------------------------------------● (24->19)
| |
| | eeodoeodooedeeododooodoede$
| +---------------------------------------● (25)
| |
| | o
| +---------------------------------------● (31->33)
| |
| | e
| +---------------------------------------● (62->64)
| |
| | de$
| +---------------------------------------● (63)
| |
| | odooedeeododooodoede$
| +---------------------------------------● (32)
| |
| | d
| +---------------------------------------● (51->22)
| |
| | eeodoeodooedeeododooodoede$
| +---------------------------------------● (21)
| |
| | ooodoede$
| +---------------------------------------● (52)
| |
| | o
| +---------------------------------------● (55->57)
| |
| | edeeododooodoede$
| +---------------------------------------● (39)
| |
| | odoede$
| +---------------------------------------● (56)
| |
| | o
| +---------------------------------------● (59->35)
| |
| | edeeododooodoede$
| +---------------------------------------● (41)
| |
| | doede$
| +---------------------------------------● (61)
| |
| | odoede$
| +---------------------------------------● (60)
|
| $
+----------------------------------------● (72)
ooooooooo$
● (0)
|
| o
+----------------● (89)
| |
| | $
| +---------------● (90)
| |
| | o
| +---------------● (87->89)
| |
| | $
| +---------------● (88)
| |
| | o
| +---------------● (85->87)
| |
| | $
| +---------------● (86)
| |
| | o
| +---------------● (83->85)
| |
| | $
| +---------------● (84)
| |
| | o
| +---------------● (81->83)
| |
| | $
| +---------------● (82)
| |
| | o
| +---------------● (79->81)
| |
| | $
| +---------------● (80)
| |
| | o
| +---------------● (77->79)
| |
| | $
| +---------------● (78)
| |
| | o
| +---------------● (75->77)
| |
| | $
| +---------------● (76)
| |
| | o$
| +---------------● (74)
|
| $
+----------------● (91)
mississippi$
● (0)
|
| i
+------------------● (104)
| |
| | ppi$
| +-----------------● (105)
| |
| | $
| +-----------------● (109)
| |
| | ssi
| +-----------------● (98->100)
| |
| | ppi$
| +-----------------● (99)
| |
| | ssippi$
| +-----------------● (94)
|
| p
+------------------● (107)
| |
| | i$
| +-----------------● (108)
| |
| | pi$
| +-----------------● (106)
|
| s
+------------------● (96)
| |
| | i
| +-----------------● (102->104)
| |
| | ppi$
| +-----------------● (103)
| |
| | ssippi$
| +-----------------● (97)
| |
| | si
| +-----------------● (100->102)
| |
| | ppi$
| +-----------------● (101)
| |
| | ssippi$
| +-----------------● (95)
|
| mississippi$
+------------------● (93)
|
| $
+------------------● (110)
Have fun!