itsmeadarsh2008 / flpc

A Rust-based regex crate wrapper for Python3 to get faster performance. 👾
https://pypi.org/project/flpc/
MIT License
64 stars 1 forks source link

add support for the regex-split crate to add split_inclusive functionality #2

Open kpdowney opened 4 months ago

kpdowney commented 4 months ago

In the python split if a capture group is used the split returns the split character in addition to the delimited tokens. The default rust regex crate does not do this. I believe this is used pretty extensively in python and would be a great add IMO.

itsmeadarsh2008 commented 4 months ago

@kpdowney Can you give me an example of how you would do that native re-module in Python (no flpc)? (inclusive of a test regex and valid strings to test against, So I check my code according to the output) How exactly do you want your code to be structured? consistent naming system. The underscore in split_inclusive functionality kills the point of being an analogical library.

KevinPD66 commented 4 months ago

Good morning and sorry for the delay. Super simple example with the standard re module. Here I have split on a space, a period and an exclamation mark. With a capture group if brings back both the words as well as what I split on. This is the functionality I was suggesting as a consideration. The naming I mentioned is not important - whatever you feel is best I would be very happy with. Thanks for your help.

text = "The fox jumps over the dog. Poor dog!" re.split(r'(\s|.|!)', text)

Output: ['The', ' ', 'fox', ' ', 'jumps', ' ', 'over', ' ', 'the', ' ', 'dog', '.', '', ' ', 'Poor', ' ', 'dog', '!', '']

itsmeadarsh2008 commented 3 months ago

I'm unsure about the issue, but adding a separate dependency would make it unnecessarily bloated.