Open dhowe opened 3 months ago
What you want is called a generalized suffix array in literature. Libdivsufsort is by now fairly old, unmaintained and not state of the art. Pretty sure you will find software better suited for your job.
That said libdivsufsort is machine independent and will simply sort an array of unsigned 8 bit integers. So can simply use the 0 value as a seperator if it does not appear anywhere in your strings. (Haven't looked at the code in quite some years. Might be the case that you have to use 1).
I assume you mean ascii 0 (NUL) rather than ascii 48 (0)... That was the first thing I tried, but got some strange results. I'll try next with ascii 1. Any pointers to a library for a generalized suffix array would be great too. Thanks.
Hi Daniel. You could try gsufsort: https://github.com/felipelouza/gsufsort
Best regards.
On Thu, Jul 25, 2024 at 11:36 AM Daniel Howe @.***> wrote:
I assume you mean ascii 0 (NUL) rather than ascii 48 (0)... That was the first thing I tried, but got some strange results. I'll try next with ascii
- Any pointers to a library for a generalized suffix array would be great too. Thanks.
— Reply to this email directly, view it on GitHub https://github.com/y-256/libdivsufsort/issues/28#issuecomment-2250490072, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPS7ZGSKMZJE5YYTQK6BXDZOEEO7AVCNFSM6AAAAABLNZV2YSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJQGQ4TAMBXGI . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Daniel Saad Nogueira Nunes
libdivsufsort does't know about ascii it sorts bytes independent of encoding that's why I wrote "value 0".
If you had issues with using zero then it might be that libdivsufsort treats them as string terminator. I haven't looked into the code since 2016 so ... no idea.
Not sure if this question makes sense and it may also be machine-dependent, but I'm wondering if there is a delimiter char I can use to separate words in the input to libdivsufsort so that this char will sort less than any of the characters in my text (the usual ascii set) and thus not otherwise affect the order of the suffix array ?