timdown / rangy

A cross-browser JavaScript range and selection library.
MIT License
2.25k stars 367 forks source link

Is it possible to expand a range by unit "sentence"? #392

Open Solaner opened 8 years ago

Solaner commented 8 years ago

According the documentation, the unit must be one of "word" and "character". Is there or was there an implementation of range functions for the unit "sentence"? In the test files directory there is the file words.html that seems to me, was intended for this. But I could not get it working for the unit "sentence". Is words.html just a remainder of an unfinished implementation? If no, what's the trick to make it working with the unit "sentence"? If yes, is the code for this feature still available or is there a workaround to mimic the missing feature?

timdown commented 8 years ago

It's hard. A naive implementation would be relatively easy but proper sentence identification is really difficult just in English, without even considering other languages. I did a little googling on this subject when I was writing the TextRange module and it was enough to put me off.

I may come back to this once I've caught up with all the other stuff I've been ignoring for the last year or so.

Solaner commented 8 years ago

I’d be happy to see a naive implementation in action, to say if it’s good enough or not. If such an implementation wouldn’t require modification of the Rangy library, perhaps I could do it with some hints on how to proceed. I’ve been surfing a while and found some interesting stuff. For instance Blast.js seems to me, is giving a good result. It cannot handle format boundaries within a sentence. But Rangy is perfect in handling these format changes. So a combination of the two implementations would solve this problem. Are there other approaches, e.g. with some magic regular expressions? Something like this one, that works perfect for words in almost any language. Is it naive to think, that there might be a regular expression which could just be assigned to the wordRegex property in order to actually turn it to a sentenceRegex? I have found material about regular expressions for sentence boundaries. But with my limited knowledge of the English language and my lack of understanding sophisticated regular expressions, I’m not always sure if they describe available solutions or if they just describe “would be good to have” solutions. Here are some links about regular expression approaches:

If you can give me some hints, what direction(s) to go, I might be able myself to implement: