louisabraham / pydivsufsort

Python package for string algorithms ➰
MIT License
38 stars 4 forks source link

improve and document the LCP query interface #24

Closed louisabraham closed 4 years ago

louisabraham commented 4 years ago

@seanlaw what do you think of the API to do efficient LCP queries? see here

There might be area for improvement. For example, the segtree name is confusing as it refers to the segtree array and the (rank, segtree) tuple.

The function naming is also obscure as not a lot of users care about the segment tree algorithm behind.

Maybe a wrapping class would be better while having the same efficiency?

seanlaw commented 4 years ago

While I am not familiar with querying approaches, I agree that the naming is confusing.

Having said that, perhaps a lot of this can be improved via:

  1. Better docstrings with what the inputs/outputs mean and a reference that explains clearly what is happening if possible
  2. Nice, simple illustrative examples that are borrowed from common examples (I am always a fan of referencing Wikipedia examples where possible followed by published work)

I think that if we start with the use case/example first then the function name will become obvious

I am not sure that a wrapping class is necessary. At least, not yet. I think we should keep it as flat as possible and focus our energy on beefing up the docs and examples. A class would be jumping the gun since we're the only ones using it right now. Users will give us the feedback.

louisabraham commented 4 years ago

So the thing is currently we have to do that:

s = "abcab"
thing = lcp_segtree(s)
lcp_query(*thing, [[0, 3]])

What I think is that thing is not very convenient to carry around. An object would solve that problem.

seanlaw commented 4 years ago

I see your point and agree that a class would solve that issue.

Is there a good place to compile a list of potential things that I’d want to “query” once the suffix_array and lcp_array are computed? Should I create a new issue?

louisabraham commented 4 years ago

With the latest version (not yet on PyPI):

>>> from pydivsufsort import WonderString
>>> s = WonderString("abcdabcd")
>>> s.lcp(0, 5)
0
>>> s.lcp(0, 4)
4
>>> s.lcp([[0,4],[1,5]])
array([4, 3], dtype=int32)