Open nvaccessAuto opened 11 years ago
Comment 1 by jteh on 2012-12-11 08:49 Currently, the rule is that more than two repeats (i.e. more than three total) of any single character symbol are stripped. There are then two possibilities:
Comment 2 by twynn92 (in reply to comment 1) on 2012-12-11 09:03 Replying to jteh:
Currently, the rule is that more than two repeats (i.e. more than three total) of any single character symbol are stripped. There are then two possibilities:
So what's the case with the in-word apostrophe? Is there a third case for in-word punctuation, then?
- If the symbol should be spoken at this level, the replacement will be preceded by a count; e.g. "...." becomes "4 dot". Preserving where the repeat count is spoken is controversial; do we say "4 comma,,,," or just "4 comma,"?
It depends if as to what the reserve level is, I suppose, like like reading any other punctuation that is not repeated. With a repeat of a character, I figure it should follow the same rules as reading, as to have only one set of rules and no ambiguity.
- If the symbol should not be spoken at this level, we strip the symbol altogether. Frankly, I didn't think about preserve when I wrote this part of the code. We probably should just output the unmodified symbols in this case with no truncation or stripping, though I'm a bit uncertain here.
Uncertain about what, exactly? Seems truncation and stripping are synonymous.
Comment 3 by jteh (in reply to comment 2) on 2012-12-11 09:15 Replying to twynn92:
So what's the case with the in-word apostrophe? Is there a third case for in-word punctuation, then?
The repeat rule doesn't apply to complex symbols. However, the repeat rule applies before complex symbols.
Preserving where the repeat count is spoken is controversial; do we say "4 comma,,,," or just "4 comma,"?
I figure it should follow the same rules as reading, as to have only one set of rules and no ambiguity.
It's not about when it should be preserved. It's how it should be preserved. The two options i gave should explain this best. The point is that with a repeat count, the symbol itself is only spoken once, but it's unclear as to whether it should be preserved only once or all repeats as well. Also, with a repeat count, it may not make sense to preserve at all, since the whole point of the repeat rule is to strip extraneous symbols. Hearing "dash dash dash dash dash dash dash dash", etc. is kind of annoying.
- If the symbol should not be spoken at this level, we strip the symbol altogether. ... We probably should just output the unmodified symbols in this case with no truncation or stripping, though I'm a bit uncertain here.
Uncertain about what, exactly? Seems truncation and stripping are synonymous.
By truncation, I mean keeping the symbol but killing one or more repeats. By stripping, I mean killing it altogether. Again, the whole point of the repeat rule is to strip extraneous symbols.
Comment 4 by twynn92 (in reply to comment 3) on 2012-12-11 18:14 Replying to jteh:
The repeat rule doesn't apply to complex symbols. However, the repeat rule applies before complex symbols.
Irm... I'm not sure I follow that explanation at all. Can you elaborate , though I'm pretty sure it doesn't really apply in this case.
Preserving where the repeat count is spoken is controversial; do we say "4 comma,,,," or just "4 comma,"?
The two options i gave should explain this best. The point is that with a repeat count, the symbol itself is only spoken once, but it's unclear as to whether it should be preserved only once or all repeats as well. Also, with a repeat count, it may not make sense to preserve at all, since the whole point of the repeat rule is to strip extraneous symbols. Hearing "dash dash dash dash dash dash dash dash", etc. is kind of annoying.
Ah, I didn't see the four trailing commas in your first example in the web buffer. I assume that since the goal is to eliminate as much extraneous symbols as possible, "four comma," should suffice quite nicely. Since the 'preserve' fieled is set to always, logic dictates that the symbol should be sent to the synthesizer. If the number of said symbol is spoken anyway, we just need to have the pause/inflection change. But if the 'level' field is set below the level you're currently at, or "char", and the 'preserve' field is set to "always", then the symbol should be sent in full, as there are no other indication of how many symbols present, and it's up to the discretion of the synthesizer to dictate what is spoken, just like a nonrepeating symbol.
in the case of an ellipsis or and multi-byte symbols, maybe the symbol should only be sent the amount necessary to show the proper pause/inflection change. I do realize that the two suggestions in the above paragraph contradicts a bit, but as our goal is to reduce extraneous symbols, this is probably a good compromise. Thoughts?
By truncation, I mean keeping the symbol but killing one or more repeats. By stripping, I mean killing it altogether.
In that case, truncation is probably best, though as you said above, how much we should preserve is the dilemma here.
Comment 5 by jteh (in reply to comment 4) on 2012-12-11 23:47 Replying to twynn92:
The repeat rule doesn't apply to complex symbols. However, the repeat rule applies before complex symbols.
Irm... I'm not sure I follow that explanation at all.
It doesn't apply to complex symbols. The in-word apostrophe is one example. Sentence endings are others. However, the repeat rule matches before all other rules, so it'll probably squash an ellipsis rule, which is one of the reasons it only matches above two repeats.
in the case of an ellipsis or and multi-byte symbols, maybe the symbol should only be sent the amount necessary to show the proper pause/inflection change.
The repeat rule only applies to single-character symbols, never multi-character symbols.
Comment 6 by twynn92 (in reply to comment 5) on 2012-12-12 04:24 Replying to jteh:
It doesn't apply to complex symbols. The in-word apostrophe is one example. Sentence endings are others. However, the repeat rule matches before all other rules, so it'll probably squash an ellipsis rule, which is one of the reasons it only matches above two repeats.
SO what should the solution for this problem be? Should we just add an entry to the user dictionary for an ellipsis with more than three dots, and leave the repeat code unchanged? If this is the case, then norep" and "always" in the 'preserve' field would be exactly the same when having repeating punctuation. Also, for those who accidentally use one too many apostrophes, or any complex symbol, and has a punctuation level that is lower than the symbols current level, they wouldn't have any indication of the complex symbol. This also applies to regular symbols as well now that I think about it.
So I assume now that the problem is how much we should send to the synthesizer in this case? One instance, or the whole string unmodified.
Comment 7 by twynn92 on 2012-12-12 04:25 Changes: Changed title from "Punctuation repeating over a certain amount is not sent to the synthesizer" to "Punctuation repeating over three times is not sent to the synthesizer even if the 'preserve' field is set to "always""
@jcsteh Since you were assessing this issue earlier, could you please respond to https://github.com/nvaccess/nvda/issues/2861#issuecomment-155303118 and kindly summarize the problems and solutions identified?
I think we should change the repeat rule so that:
Hi,
Seven years later...
Updates please.
Thanks.
Cc: @CyrilleB79
Reported by twynn92 on 2012-12-11 08:29 If a certain punctuation is set to always be sent to the synthesizer (according to the 'preserve' filed in the symbols pronunciation), and it exceeds by a certain amount, it is not sent to the synthesizer. I've no idea how many times a punctuation has to be repeated in order to be not sent to the synthesizer, but I assume it's four or more for sentence/phrase endings, and two or more for everything else. Example sentences are below.
The following is an ellipse... The following is an ellipse with an extra dot.... This sentence continues after the last without any pausing. The following is a comma, as to prove a point. The following are four commas,,,, and does not pause. I'm very awesome. I''m not so awesome.
obviously, these cases aren't encountered in real-world situations that often, except for maybe an ellipsis with extra trailing periods. My personal solution is to make a dictionary entry to limit any ellipsis that goes over by any amount, to limit itself to NVDA to three characters. I figured I'd report this as a bug anyway, just in case it may cause unintended consequences later on.