louisabraham / pydivsufsort

Python package for string algorithms ➰
MIT License
38 stars 4 forks source link

Successive Repetitions #48

Closed rwarholic closed 2 months ago

rwarholic commented 6 months ago

Hi, most_frequent_substrings returns the index of one instances of a repeated substring. However, is there a way to adapt the code to return the position of each repeated substring?

For example, if I am running the code below

txt ="I love apples and pears and pears and pears" pos, cnt = most_frequent_substrings(lcpArr, 5, limit=0, minimum_count=2)

This gives a position p = 9 or suffixArr[p] = 34

Just for readability, I'm printing my output like this: 'and pears' appears 3 times with final position 34

However, I would be interested developing something to return a position array for each substring such as: 'and pears' appears 3 times at positions [14, 24, 34]

I've never implemented anything with cython so reading through the code and docstring is currently a little foggy to me, any direction would be appreciated!

louisabraham commented 6 months ago

Hi! The suffix array contains the positions of the sorted suffixes. Hence suffixArr[p:p+cnt] is what you are looking for.

louisabraham commented 2 months ago

closing this for inactivity.