plamoni / SiriProxy

A (tampering) proxy server for Apple's Siri
GNU General Public License v3.0
2.12k stars 343 forks source link

can not match Chinese words #450

Closed galenzhao closed 11 years ago

galenzhao commented 11 years ago

Hi, I write some Regex to match Chinese words but it doesn't work. in plugin: both of the regex can not match Chinese words

listen_for /([一-赵]+)/i do |word|
listen_for /(\p{Han}+)/i do |word|

output:

[Info - Plugin Manager] Processing '你好吗 '
[Info - Plugin Manager] Processing plugin #<SiriProxy::Plugin::Example:0xa94af84>
[Info - Plugin Manager] No matches for '你好吗 '

but it works on http://www.rubular.com/, a regex online tester.

galenzhao commented 11 years ago

and there's a funny problem, in the same plugin, there're 2 rules,

  listen_for /测试/i do

  listen_for /中文测试/i do

,but only the second one can be matched,

[Info - Guzzoni] Received Object: SpeechRecognized
[Info - Plugin Manager] Processing '测试 '
[Info - Plugin Manager] Processing plugin #<SiriProxy::Plugin::Example:0x941efd4>
[Info - Plugin Manager] No matches for '测试 '

[Info - Plugin Manager] Processing '中文测试 '
[Info - Plugin Manager] Processing plugin #<SiriProxy::Plugin::Example:0x941efd4>
[Info - Plugin Manager] Matches (?i-mx:中文测试)
[Info - Plugin Manager] Applicable states:
[Info - Plugin Manager] Current state:
[Info - Plugin Manager] Matches, executing block
[Info - Plugin Manager] Say: 中文测试成功!
[Info - Plugin Manager] Sending Request Completed
plamoni commented 11 years ago

Honestly, Chinese is not my thing. Hopefully someone with some programming experience and a knowledge of Chinese can help. :-/

EricInBj commented 11 years ago

Set utf-8 encode on the top of your plugin .rb file

From: Pete Sent: Monday, February 18, 2013 12:00 PM To: plamoni/SiriProxy Subject: Re: [SiriProxy] can not match Chinese words (#450)

Honestly, Chinese is not my thing. Hopefully someone with some programming experience and a knowledge of Chinese can help. :-/

— Reply to this email directly or view it on GitHub.

elvisimprsntr commented 11 years ago

i'm assuming the utf-8 encode fixed the problem.