chinese quotes treated as a single word

extoplasm commented 6 months ago

Did you clear cache before opening an issue?

[X] I have cleared my cache

Is there an existing issue for this?

[X] I have searched the existing issues

Does the issue happen when logged in?

Yes

Does the issue happen when logged out?

Yes

Does the issue happen in incognito mode when logged in?

Yes

Does the issue happen in incognito mode when logged out?

Yes

Account name

extoplasm

Account config

{"theme":"alduin","themeLight":"serika","themeDark":"serika_dark","autoSwitchTheme":false,"customTheme":false,"customThemeColors":["#323437","#e2b714","#e2b714","#646669","#000000","#d1d0c5","#ca4754","#7e2a33","#ca4754","#7e2a33"],"favThemes":[],"showKeyTips":true,"smoothCaret":"medium","quickRestart":"off","punctuation":false,"numbers":false,"words":10,"time":60,"mode":"quote","quoteLength":[0],"language":"chinese_simplified","fontSize":1.5,"freedomMode":true,"difficulty":"normal","blindMode":false,"quickEnd":false,"caretStyle":"default","paceCaretStyle":"default","flipTestColors":false,"layout":"default","funbox":"none","confidenceMode":"off","indicateTypos":"off","timerStyle":"mini","liveSpeedStyle":"off","liveAccStyle":"off","liveBurstStyle":"off","colorfulMode":false,"randomTheme":"off","timerColor":"main","timerOpacity":"1","stopOnError":"off","showAllLines":false,"keymapMode":"off","keymapStyle":"staggered","keymapLegendStyle":"lowercase","keymapLayout":"qwerty","keymapShowTopRow":"layout","fontFamily":"JetBrains_Mono","smoothLineScroll":false,"alwaysShowDecimalPlaces":false,"alwaysShowWordsHistory":false,"singleListCommandLine":"manual","capsLockWarning":true,"playSoundOnError":"off","playSoundOnClick":"9","soundVolume":"1.0","startGraphsAtZero":true,"showOutOfFocusWarning":true,"paceCaret":"pb","paceCaretCustomSpeed":1,"repeatedPace":true,"accountChart":["on","on","on","on"],"minWpm":"off","minWpmCustomSpeed":100,"highlightMode":"letter","typingSpeedUnit":"wpm","ads":"result","hideExtraLetters":false,"strictSpace":false,"minAcc":"off","minAccCustom":90,"monkey":false,"repeatQuotes":"off","oppositeShiftMode":"off","customBackground":"","customBackgroundSize":"cover","customBackgroundFilter":[0,1,1,1,1],"customLayoutfluid":"qwerty#dvorak#colemak","monkeyPowerLevel":"off","minBurst":"off","minBurstCustomSpeed":100,"burstHeatmap":true,"britishEnglish":false,"lazyMode":false,"showAverage":"off","tapeMode":"off","maxLineWidth":0}

Current Behavior

when typing in chinese, entire quote is treated as one word -> whenever space is pressed the test finishes, also every quote is in the short category.

Expected Behavior

could count every character excluding punctuation as a word

Steps To Reproduce

change language to chinese simplified
go to quotes
press space
test finished

Environment

OS: Windows 10
Browser: Google Chrome
Browser Version: Version 124.0.6367.119 (Official Build) (64-bit)

Anything else?

No response

faq0 commented 6 months ago

If that can be fixed, I believe the spaces in the "words" section should be typed automatically too, as a sentence in Simplified Chinese does not include spaces. e.g.: In "只有出现革命存在发生方法…", users should not need to hit the spacebar before entering the next word.

extoplasm commented 6 months ago

i reckon you keep the words the same, it’s good to separate the words

but just count every character in a sentence as a word in the quotes section

Miodec commented 6 months ago

The characters used are full-width commas, and as @faq0 said, simplified chinese does not include spaces, so im not sure what should be done here.

faq0 commented 6 months ago

I believe there are some commonly used full-width punctuation marks in simplified Chinese, which can be set as an exception in the quote mode. e.g.: Some of these include "，。！？“”：；《》—", have the unicode \uff0c\u3002\uff01\uff1f\u201c\u201d\uff1a\uff1b\u300a\u300b\u2014.

But for the zen or custom modes, they might need other rules as the punctuation marks are not limited to these characters.

However, I have noticed that, in fact, many Chinese typing practice websites do actually count symbols as a character, being calculated towards the WPM. That might be an easy way for that.

extoplasm commented 6 months ago

the punctuation isn't an issue, there isn't much punctuation in the quotes anyways, i reckon you can count every character as a word and parse out the full width punctuation or change it into its english equivalent when counting the words although this would be rough to implement.

it's really up to you, but as a quasi-mandarin speaker this is just my suggestion.

Miodec commented 6 months ago

So, whats the solution? Because if you want to add spaces you would need to edit the quotes themselves.

extoplasm commented 6 months ago

wdym, i’m saying we count each character as a word, as mandarin doesn’t follow the rule that each word is separated by spaces. eg. “猴子打字” (monkey type lol) counted as 4 separate words

extoplasm commented 6 months ago

also if we add spaces it wouldn’t be accurate, not sure how the word counting works but a special case can be added to split the characters differently (removing the punctuation before of course)

Miodec commented 6 months ago

So, this should be the case for all chinese text, not just quotes right.

Is this because you need multiple keypresses per character? Maybe we can count each keypress as a character, instead of each character as a word.

faq0 commented 6 months ago

So, this should be the case for all chinese text, not just quotes right.

Yes.

Maybe we can count each keypress as a character, instead of each character as a word.

This would be good in most cases, but I believe that could be the way to calculate the speed, not the accuracy. In fact, there are mutliple typing methods in Simplified Chinese that might result in different number of keystrokes.

e.g.: For an example quote "我能吞下玻璃而不伤身体", In Full Pinyin, it would be "wonengtunxiabolierbushangshenti" (31 chars). In Double Pinyin, it would be "wongtpxwboliorbuuhufti" (2 keys/word, total 22 chars). For Wubi, that would be 4 keys/word, total 44 chars. But in this case, there is lower amount of time needed to select the desired Chinese characted in the candidate window.

extoplasm commented 6 months ago

yes i agree with faq0 on the speed calculation part but the main issue is that in the quotes the entire sentence is counted as one word, i’m suggesting that we split the quote by character instead of by space as when someone presses space the test ends and the progress is inaccurate

Miodec commented 6 months ago

yes i agree with faq0 on the speed calculation part but the main issue is that in the quotes the entire sentence is counted as one word, i’m suggesting that we split the quote by character instead of by space as when someone presses space the test ends and the progress is inaccurate

If you split by character then the website will require you to press space between every chracter. When you type quotes normally, when do you press space? (not on monkeytype).

extoplasm commented 6 months ago

in chinese there is no such thing as a space lol if its like that then there might not be an easy solution perhaps make a special case??? because im like 50% sure its the same for any asian language, this could be good if adding quotes for other languages

faq0 commented 6 months ago

If you split by character then the website will require you to press space between every chracter. When you type quotes normally, when do you press space? (not on monkeytype).

We might not press space for every character. In fact, there is a candidate window (IME window) to choose from a list of characters.

We might not press the space key. If I want to type the character"我" in Full Pinyin, that would be: What I type: w o <spacebar>. In this case, the candidate window will be (Microsoft Pinyin IME as an example): I have to select one of the desired character in the candidate list, whereas "1" = "我", "2" = "喔", etc.. I can also press the spacebar as an alternative to select the first option (the spacebar is more commonly used than "1" when selecting the first option).

We might not press the key for every character. In a longer sentence, such as "我能吞下玻璃而不伤身体", I can type the sentence at once. In Full Pinyin, this would be: What I type: w o n e n g t u n x i a b o l i e r b u s h a n g s h e n t i <spacebar>. It is lucky that in this case, my desired sentence is at the first place. I can press spacebar. However, if that isn't the case. I may have to select each character (or word) one by one, divided using the apostrophe shown in the IME. For example,

This means that there are many ways to type a sentence, with some of them not containing a spacebar keystroke. I believe that monkeytype should just detect the number of keystrokes when a character itself is typed.

Miodec commented 6 months ago

in chinese there is no such thing as a space lol if its like that then there might not be an easy solution perhaps make a special case??? because im like 50% sure its the same for any asian language, this could be good if adding quotes for other languages

What if i just disable space then? Monkeytype wont try to "move to the next word" because there would be no "next word" and that "moving to the next word" wont even be triggered by the space. The only thing the space would be doing is interacting with the input manager, like it already does.

extoplasm commented 6 months ago

I believe that monkeytype should just detect the number of keystrokes when a character itself is typed.

what does this mean?

extoplasm commented 6 months ago

What if i just disable space then? Monkeytype wont try to "move to the next word" because there would be no "next word" and that "moving to the next word" wont even be triggered by the space. The only thing the space would be doing is interacting with the input manager, like it already does.

this should be good enough haha

faq0 commented 6 months ago

what does this mean?

Keystroke per second is calculated based on the number of keystrokes, which will be shown on the final speed chart, while the accuracy and WPM is calculated based on the typed Chinese characters per second.

faq0 commented 6 months ago

What if i just disable space then? Monkeytype wont try to "move to the next word" because there would be no "next word" and that "moving to the next word" wont even be triggered by the space. The only thing the space would be doing is interacting with the input manager, like it already does.

This should be a good idea, as long as it can deal with the speed and accuracy correctly.

extoplasm commented 6 months ago

another problem might be that 1 misspelt character results in the test being unable to finish, as when u disable space, it will stop the test from force finishing as monkeytype does not let you finish on a misspelt word.

extoplasm commented 6 months ago

im pretty sure you have to both split quote by character and disable spaces

extoplasm commented 5 months ago

i've done some thinking and this problem is present in nearly all text input based websites:

here

For Chinese and Japanese, WorldServer has a special way to count words. Each character is considered a word. For these languages we are, effectively, counting characters. When a user sees "Words" in the WorldServer UI (for example, in scoping) for Chinese and Japanese source languages it actually means "Characters".

https://docs.rws.com/791662/251856/sdl-worldserver-11-0-1/word-counting-algorithm

the best way, imo, is to count every character as a word, remove "spaces" when presenting input to user, and auto-nextword when they type a character

is there a way to auto-nextword?

where is the code to handle next words in the file system?

Miodec commented 5 months ago

@extoplasm Which languages should use this per character way of calculating speed?

Miodec commented 5 months ago

Also, are the calculated speeds accurate if you just change the typing speed unit to cpm in the settings?

extoplasm commented 5 months ago

@extoplasm Which languages should use this per character way of calculating speed?

japanese and chinese off the top of my head

Also, are the calculated speeds accurate if you just change the typing speed unit to cpm in the settings?

not sure can’t test rn i’m not at home

faq0 commented 5 months ago

Also, are the calculated speeds accurate if you just change the typing speed unit to cpm in the settings?

They are accurate (I don't know if the data is accurate or not, but they indeed work) in the "words" section, but the results cannot be uploaded due to "Result data doesn't make sense" after multiple attempts.

The speed calculation doesn't even work in quotes.

When 1 character is mistyped, it will not auto proceed to the completion page (for a quote that shows as 1 total word). I have to press spacebar manually and it shows a CPM of 0, but with an accuracy of 95%.

When no characters are typed wrongly, it will still show the "Result data doesn't make sense" error.

Plus, I've noticed some wrongly written characters in the quotes section. How do I report these?

extoplasm commented 5 months ago

Plus, I've noticed some wrongly written characters in the quotes section. How do I report these?

is that my bad... oops you don't need to report this just make a PR

faq0 commented 5 months ago

is that my bad... oops you don't need to report this just make a PR

PR added. Added some quotes as well. https://github.com/monkeytypegame/monkeytype/pull/5465

Miodec commented 5 months ago

Looking at the data, it looks like you're reporting less keypresses than characters typed. Looks like the input system is eating up some of the keypress events (which seems to be the same issue as someone else just opened with Korean typing..)

monkeytypegame / monkeytype