knod / read_plugin

Chrome plugin for "read"
9 stars 8 forks source link

Create word fragments more flexibly (allowing for custom/adjustable max-word length) #51

Open knod opened 7 years ago

knod commented 7 years ago

Right now a list of word fragments is created at the start. This isn't just for accessibility - when the width of the window is decreased, fragment length becomes an issue as well (#50). Since we want to be able to change the maximum length of a word, we don't want the list to have to be re-created each time, especially for long pages, so the word fragments will have to be created as we go. I'm not sure how to handle that with regex, especially when moving backwards. I don't believe regex can take index numbers.

This is complicated by the desire to be able to rewind by sentence (scroll vertically, btw, or scroll horizontally with a modifier key, could be a way to scroll sentences instead of words).

1) My first thought was that, because regex doesn't do indexes, we'd keep making new strings, getting rid of words as we move forward through them, and use regex to get word fragments from that. That may make it more complicated to go backwards, though. How would we handle sentences with this?

2) Another option is that we use [.||\s\S]*{#}, or something like it, to keep track of the index within the regex expressions as well. Still not sure how to travel backwards from that. Sentences?

3) Another option is to do as with option number 1, but make a list of previous word fragments as we tear them off, but going backwards would mean un-zipping that list. Sentences?

4) (My current favorite) Perhaps a list of sentences containing lists of just words (not fragments) (including spaces as their own list item, for future visualization of spaces - #52), then navigate within those words to get the right length of word. Progress/scrubbing would have to be done a bit differently - the index may have to be the index of the letter in the whole text since the indexes of individual word-fragments depends on the max-length allowed and that can change. Since we're not re-assessing the whole text each time, this will be unknown.

5) As far as sentences go, it's possible we could just keep a list of sentence indexes - the locations at which sentences start. We might be able to do that with words as well, but I'm not sure it would be as useful.

Note: There will be a customizable max-word-length, but there will also be a dynamic window-width-based word length (see #50).

Note: when a word is too long, a dash must be included in the measuring of the length of the word, and that preexisting dashes should not be repeated. (Do not add a dash and then remove occurrences of multiple dashes - could affect the intention of the original text. If ends in a dash, don't add a dash)

knod commented 7 years ago

Thinking it over, keeping actual words (as opposed to word fragments) in an array doesn't sound so crazy. First off, progress could be tied to number of words as opposed to fragments, especially considering that the progress bar is also the scrubber - why would someone scrub to the middle of a word as opposed to the beginning of a word.

If that's the direction we go, we'd need a runtime assessment of each word as we travel through the text, but we'd want that anyway. It would get the index of the word we want and then split it into fragments on the spot if needed. That would get rid of the need for regex to keep track of where in the text the app is. That does leave the question of whether each "word" is a string or an array of strings (or an object with the original string and an array of strings). If we store the fragments those fragments would only have to be reconstructed if the user changes the max length of a fragment. I'm not sure it would make a huge impact on speed either way.

Also, if those runtime operations are going on, I need to check how scrubbing currently works - does it simply pass each index in, or does it try to .next() or .prev() to the current index. I think it's the former, but it could be worth checking.

There's an npm library, that I would need to relocate, that does basically what esprima does, but with English sentences instead of code. The downside of that is that it probably doesn't do the same for other languages and I don't really want to replicate that behavior at the moment. That can be checked on. The downside is that it would add more weight to what we already have - it does a fair amount more than what we need.

The question about tracking sentences still stands - do we want to bother keeping lists of sentences, which would require either math or confusing variables or both, or do we just keep the indexes of the start of each sentence, which would just require a confusing variable? It also depends on how we get those sentences separated to begin with.