Closed Vic-Lau closed 1 year ago
Hi @Vic-Lau, this package was not built and tested on international languages, but the following works for me on a Mac.
And the Google search box populates correctly the 3 Chinese characters. Can you try it and tell me your OS?
PS - I'm Chinese too! 🙌
# -*- coding: utf-8 -*-
import rpa as r
r.init()
r.debug(True)
r.url('https://www.google.com')
r.type('//*[@name="q"]', '撒')
r.type('//*[@name="q"]', '中文')
r.close()
Hi @Vic-Lau, this package was not built and tested on international languages, but the following works for me on a Mac.
And the Google search box populates correctly the 3 Chinese characters. Can you try it and tell me your OS?
PS - I'm Chinese too! 🙌
# -*- coding: utf-8 -*- import rpa as r r.init() r.debug(True) r.url('https://www.google.com') r.type('//*[@name="q"]', '撒') r.type('//*[@name="q"]', '中文') r.close()
Hi @kensoh Mr.kensoh, Thank you for your reply. My os is windows10, and all the tests were done on different machines.
First problem: Some Chinese characters will report this error. for example: '撒'
r.type('//*[@name="q"]', '撒')
# Debug Info:
[RPA][1] - https://www.google.com
[RPA][1] - listening for inputs
[RPA][2] - exist_result = exist('//*[@name="q"]').toString()
[RPA][2] - listening for inputs
[RPA][3] - dump exist_result to rpa_python.txt
[RPA][3] - listening for inputs
[RPA][ERROR] - 'utf-8' codec can't decode bytes in position 34-35: invalid continuation byte
[RPA][4] - listening for inputs
Second problem: It's a bit different from your understanding, I snap the chrome address input "about:blank" as input.png to test 'visual_automation=True', It have some problems, show you the code:
# -*- coding: utf-8 -*-
import rpa as r
def test():
r.init(visual_automation = True, chrome_browser=True)
# r.type(r'D:\work\tagui-python\tagui_scripts\input.png', '[clear]') # It's not working.
# r.type(r'D:\work\tagui-python\tagui_scripts\input.png', 'aaa') # It's good.
r.type(r'D:\work\tagui-python\tagui_scripts\input.png', '中文') # It's not working.
r.wait(10)
r.close()
if __name__ == '__main__':
test()
Mr.kensoh, Give me some advice please, Best Regard!
PS - 哈哈, 你居然也是中国人~, 那这段话你一定可以看懂了, TagUI这个开源项目真的是太棒了, 我正在试用TagUI, 并打算深度运用它来建设一个为解决复杂重复业务而生的RPA平台, 我试用了很多类似的开源工具, 直到遇到TagUI, 你的技术的专业性, 回答问题的耐心, 还有对RPA的热爱决定了我技术选型选择了TagUI, 在中国程序员中一个人的最高成就的称呼为"大神"二字, So, 大神kensoh, 很高兴能与你沟通, 再次感谢.
Hi @Vic-Lau thanks for your detailed reply!
I don't have a Windows computer but just got hold of a Windows 11.
For the first problem you mentioned, on my PC, it works both from Python interactive mode:
>>> import rpa as r
>>> r.init()
True
>>> r.debug(True)
True
>>> r.url('https://www.google.com')
[RPA][1] - https://www.google.com
[RPA][1] - listening for inputs
True
>>> r.type('//*[@name="q"]', '撒')
[RPA][2] - exist_result = exist('//*[@name="q"]').toString()
[RPA][2] - listening for inputs
[RPA][3] - dump exist_result to rpa_python.txt
[RPA][3] - listening for inputs
[RPA][4] - type //*[@name="q"] as 撒
[RPA][4] - listening for inputs
True
>>> r.type('//*[@name="q"]', '中文')
[RPA][5] - exist_result = exist('//*[@name="q"]').toString()
[RPA][5] - listening for inputs
[RPA][6] - dump exist_result to rpa_python.txt
[RPA][6] - listening for inputs
[RPA][7] - type //*[@name="q"] as 中文
[RPA][7] - listening for inputs
True
And it also works from running the Python script directly:
# -*- coding: utf-8 -*-
import rpa as r
r.init()
r.debug(True)
r.url('https://www.google.com')
r.type('//*[@name="q"]', '撒')
r.type('//*[@name="q"]', '中文')
r.close()
My Windows 11 is on English language, and I didn't do any special configuration or language setup to run above successfully.
Are you able to try on another computer (personal computer or friend/colleague's computer) to see it is able to work?
I'll reply your second report problem in the next message.
For the second problem, below are my comments:
# [clear] does not work with visual automation, it only work with normal web automation because a backend command is sent to Chrome to make a text field empty
r.type(r'D:\work\tagui-python\tagui_scripts\input.png', '[clear]') # It's not working.
# this is a working use case and yes it should work
r.type(r'D:\work\tagui-python\tagui_scripts\input.png', 'aaa') # It's good.
# the SikuliX engine used by rpa package does not support typing international characters
r.type(r'D:\work\tagui-python\tagui_scripts\input.png', '中文') # It's not working.
# try using r.clipboard('中文') and then r.keyboard('[ctrl]v') after clicking on the text box visually
r.clipboard('中文')
r.click(r'D:\work\tagui-python\tagui_scripts\input.png')
r.keyboard('[ctrl]v')
(Typing Chinese reply on my mobile phone, because my PC can't type in Chinese)
我当然看得懂,我很高兴这个开源项目得到你的认同!也非常非常谢谢你的赞赏!
你认识轶文吗?他在积极的推广这个项目的原版本TagUI。你可以用以下网址联络他。说不定有一起合作的机会。
Hi @Vic-Lau thanks for your detailed reply!
I don't have a Windows computer but just got hold of a Windows 11.
For the first problem you mentioned, on my PC, it works both from Python interactive mode:
>>> import rpa as r >>> r.init() True >>> r.debug(True) True >>> r.url('https://www.google.com') [RPA][1] - https://www.google.com [RPA][1] - listening for inputs True >>> r.type('//*[@name="q"]', '撒') [RPA][2] - exist_result = exist('//*[@name="q"]').toString() [RPA][2] - listening for inputs [RPA][3] - dump exist_result to rpa_python.txt [RPA][3] - listening for inputs [RPA][4] - type //*[@name="q"] as 撒 [RPA][4] - listening for inputs True >>> r.type('//*[@name="q"]', '中文') [RPA][5] - exist_result = exist('//*[@name="q"]').toString() [RPA][5] - listening for inputs [RPA][6] - dump exist_result to rpa_python.txt [RPA][6] - listening for inputs [RPA][7] - type //*[@name="q"] as 中文 [RPA][7] - listening for inputs True
And it also works from running the Python script directly:
# -*- coding: utf-8 -*- import rpa as r r.init() r.debug(True) r.url('https://www.google.com') r.type('//*[@name="q"]', '撒') r.type('//*[@name="q"]', '中文') r.close()
My Windows 11 is on English language, and I didn't do any special configuration or language setup to run above successfully.
Are you able to try on another computer (personal computer or friend/colleague's computer) to see it is able to work?
I'll reply your second report problem in the next message.
Hi @kensoh Mr.kensoh,Thank you for your reply first. I've tried this on 4 different computers, and the executed script uses different tool and set utf-8 , The difference between us is that my Windows 10 is on Chinese language, Is it possible that this is the cause of this problem?
For the second problem, below are my comments:
# [clear] does not work with visual automation, it only work with normal web automation because a backend command is sent to Chrome to make a text field empty r.type(r'D:\work\tagui-python\tagui_scripts\input.png', '[clear]') # It's not working. # this is a working use case and yes it should work r.type(r'D:\work\tagui-python\tagui_scripts\input.png', 'aaa') # It's good. # the SikuliX engine used by rpa package does not support typing international characters r.type(r'D:\work\tagui-python\tagui_scripts\input.png', '中文') # It's not working. # try using r.clipboard('中文') and then r.keyboard('[ctrl]v') after clicking on the text box visually r.clipboard('中文') r.click(r'D:\work\tagui-python\tagui_scripts\input.png') r.keyboard('[ctrl]v')
Hi @kensoh Mr.kensoh,Thank you for your reply. I see~~~, I have tested, clipboard() is working ! By the way, if i want to visual automation [clear], I have to replace it with [ctrl]a + [backspace]
right ? So I should think about keyboard operation more in the future.
(Typing Chinese reply on my mobile phone, because my PC can't type in Chinese)
我当然看得懂,我很高兴这个开源项目得到你的认同!也非常非常谢谢你的赞赏!
你认识轶文吗?他在积极的推广这个项目的原版本TagUI。你可以用以下网址联络他。说不定有一起合作的机会。
你好Kensoh大神,感谢你的中文回复,我还不认识轶文,但是如果真的有合作的机会,我一定会联系他的。哈哈,原来你中文也这么好啊,厉害!
Hi @Vic-Lau,
python google.py?
. From the data points so far, my best guess is encoding issue. Maybe on Chinese Windows computers, the default encoding is not UTF-8 but some other encoding. Python rpa package tries to process the text characters using the standard UTF-8 encoding you run into the errors you reported. Below file uses UTF-8 encoding. If below file works, you can try when saving your Python scripts, see if you can choose UTF-8 as the encoding to see if it works.Yes you can use r.keyboard('[ctrl]a')
and then r.keyboard('[delete]')
or r.keyboard('[backspace]')
. In the future, if there is strong user demand, I can see if the package can automatically change r.keyboard('[clear]')
to this workaround. It isn't easy to implement accurately because need to make it work on Windows, Mac, Linux and when [clear] is used as part of the string in the parameter, not as a single parameter. So I'm not adding this auto-conversion for now.
I'll type from my phone for the last part of my reply :)
我很欣慰我会华语。我认为这个是个很美,和壮观的语言。虽然比别的语言难学哈哈。
对于大神的称呼,我当然很荣幸,但也不敢当。世界之大,一山还有一山高。我只是尽我的能力,做些有意义的事。
Hi @kensoh Mr.kensoh, Thank you for your reply, I'm sure I have set utf-8, But I still get error :
(‘中文’ no problem, but the '撒' ... )
I think it should be a problem with the Chinese windows10, I can ignore this error using this method:
decode('utf-8', 'ignore') # ignore error.
But I don't think it's a good idea. Is it possible to solve it by changing the global character set to GBK? For example, the character set as the method parameter.
Hi @Vic-Lau,
- Can you try right click and download below google.txt to your computer, rename it to google.py (I cannot attach .py file here), and run
python google.py?
. From the data points so far, my best guess is encoding issue. Maybe on Chinese Windows computers, the default encoding is not UTF-8 but some other encoding. Python rpa package tries to process the text characters using the standard UTF-8 encoding you run into the errors you reported. Below file uses UTF-8 encoding. If below file works, you can try when saving your Python scripts, see if you can choose UTF-8 as the encoding to see if it works.
- Yes you can use
r.keyboard('[ctrl]a')
and thenr.keyboard('[delete]')
orr.keyboard('[backspace]')
. In the future, if there is strong user demand, I can see if the package can automatically changer.keyboard('[clear]')
to this workaround. It isn't easy to implement accurately because need to make it work on Windows, Mac, Linux and when [clear] is used as part of the string in the parameter, not as a single parameter. So I'm not adding this auto-conversion for now.- I'll type from my phone for the last part of my reply :)
Mr.kensoh,您所做的事情不仅有意义,而且也非常出色,就不要谦虚了,哈哈,致敬。
对于大神的称呼,我当然很荣幸,但也不敢当。世界之大,一山还有一山高。我只是尽我的能力,做些有意义的事。
Hi @Vic-Lau, just got time to look at the GitHub issues. Kid school holidays now, busier with childcare 😅
Do you know what is the encoding that your Windows OS uses? If you know, you can edit the tagui.py file and replace all occurence of utf-8
with the encoding code that your Windows OS uses. The location of tagui.py file can be found at import rpa as r; print(r.__file__)
. After modifying the file, you can run a new session or Python to test.
I will also ask my friend in China who is familiar with TagUI engine to test if he has issue. There are a lot of China users for this package, but no one has reported this utf-8 coding issue before, need to understand more about the problem in order to find the best solution.
Hi @kangyiwen, good day to you! Hope you have time to start touching Python. If you have started on Python, could I kindly ask you if you have issues using r.type() for this package with Chinese characters? @Vic-Lau in this issue has problems with some Chinese characters, but I can't replicate the problem on my Mac and Windows PC.
There are a lot of China users for this package, but no one has reported this utf-8 coding issue before, I'm trying to understand more about the problem in order to find the best solution reported in this GitHub issue.
抱歉,我还没用上 Python,这个问题给不了建议。
康轶文 13816359064
Ken Soh @.***> 于2023年4月9日周日 21:30写道:
Hi @kangyiwen https://github.com/kangyiwen, good day to you! Hope you have time to start touching Python. If you have started on Python, could I kindly ask you if you are issues using r.type() for this package with Chinese characters? @Vic-Lau https://github.com/Vic-Lau in this issue has problems with some Chinese characters, but I can't replicate the problem on my Mac and Windows PC.
There are a lot of China users for this package, but no one has reported this utf-8 coding issue before, I'm trying to understand more about the problem before finding the best solution reported in this GitHub issue.
— Reply to this email directly, view it on GitHub https://github.com/tebelorg/RPA-Python/issues/451#issuecomment-1501129682, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASI64QNBJBNIAK2HK4OHM3LXAK2WPANCNFSM6AAAAAAV4TZSTA . You are receiving this because you were mentioned.Message ID: @.***>
Hi @kensoh Mr.Kensoh, Best wish to your family😊! Thank you for your reply. I modified GBK, but an error was reported :
So, I debugged this problem, First I get '撒' utf-8 value is '\xe6\x92\x92' :
if __name__ == '__main__':
str = '撒'.encode('utf-8')
print(str) # '撒' utf-8 value is '\xe6\x92\x92'
and When I watch the 'input_variable' value I found this : the '撒' utf-8 value '\xe6\x92\x92' was modified to '\xe6\x92?', the code drops a 'x92'' and adds a '?'.
So, I executed this code and got the same error :
# -*- coding: utf-8 -*-
import tagui as tagui
if __name__ == '__main__':
tagui._py23_decode(b'[RPA][4] - type //*[@name="q"] as \xe6\x92?\r\n')
So is it possible that there is a problem with substring or replace when doing the conversion?
Hi @Vic-Lau, just got time to look at the GitHub issues. Kid school holidays now, busier with childcare 😅
Do you know what is the encoding that your Windows OS uses? If you know, you can edit the tagui.py file and replace all occurence of
utf-8
with the encoding code that your Windows OS uses. The location of tagui.py file can be found atimport rpa as r; print(r.__file__)
. After modifying the file, you can run a new session or Python to test.I will also ask my friend in China who is familiar with TagUI engine to test if he has issue. There are a lot of China users for this package, but no one has reported this utf-8 coding issue before, need to understand more about the problem in order to find the best solution.
Hi @Vic-Lau thank you very much! :)
In your screenshot the decode() function is still using 'utf-8'. Can you try searching and replace 'utf-8' to 'gbk' in the tagui.py and reload to see if that works? I think there at 7 occurrences that need to be replaced. The others are comments I think.
Hi @kensoh Mr.Kensoh, The first screenshot is completely replaced GBK result, It's not working. Other screenshot are debug by utf-8 result. I thought the problem might not be related to the character set, so I changed back to utf-8 and started debugging.
By the way, Do you mind if I friend you on Facebook / Wechat / Email?
Hi @Vic-Lau thank you very much! :)
In your screenshot the decode() function is still using 'utf-8'. Can you try searching and replace 'utf-8' to 'gbk' in the tagui.py and reload to see if that works? I think there at 7 occurrences that need to be replaced. The others are comments I think.
Hi @Vic-Lau,
I've checked that the decode() error comes from below line when trying to read the live output of TagUI engine. https://github.com/tebelorg/RPA-Python/blob/2f0691e4d0f590c520266adb3beeecd312d618fa/tagui.py#L130
From your finding above, the encode() changed the output from utf-8 value '\xe6\x92\x92' was modified to '\xe6\x92?'
You've changed the code for encode/decode in the tagui.py to use gbk but there is still this error.
Can you try below code in Python interactive mode? It works on my Windows PC.
>>> a = '撒'
>>> a.encode('utf-8')
b'\xe6\x92\x92'
>>> a.encode('gbk')
b'\xc8\xf6'
>>> a.encode('utf-8').decode('utf-8')
'撒'
>>> a.encode('gbk').decode('gbk')
'撒'
If it doesn't work on yours, there may be something specific to the Python environment and Windows.
If it works on yours, then the only other possible cause is for TagUI live mode which runs using Python's subprocess, the default character encoding is somehow not compatible with the Python environment, causing encoding issues despite when you are already switching to 'gbk' encoding/decoding.
Try the following code in interactive mode and share your finding? It shows the code page used by subprocess.
>>> import os
>>> os.device_encoding(0)
'cp437'
>>> os.device_encoding(1)
'cp437'
It might be required to use an encoding compatible with that codepage above in encode/decode, instead of gbk. Or changing the encoding used by subprocess using a workaround like https://discuss.python.org/t/choosing-correct-encoding-for-subprocess-popen/3388/3
But what is very puzzling is package has a large number of users from China, I don't understand why this issue wasn't reported so far before by other users. Was it because no one ran into characters with the issue, or difference in Windows environments, or no one just bother to raise the issue. Knowing this can help to find a better solution than hardcoding locally.
Sure you can add me on Facebook! I don't have WeChat account
Hi @kensoh, Mr.Kensoh, Maybe my description is wrong, When I modified tagui.py to gbk, tagui.py cannot working. So, utf-8 value '\xe6\x92\x92' was modified to '\xe6\x92?'
error is tagui.py's result in utf-8. I still think it is possible that there is a problem with substring or replace when doing the conversion.
It works on my Windows PC too :
>>> a = '撒'
>>> a.encode('utf-8')
b'\xe6\x92\x92'
>>> a.encode('gbk')
b'\xc8\xf6'
>>> a.encode('utf-8').decode('utf-8')
'撒'
>>> a.encode('gbk').decode('gbk')
'撒'
device_encoding is gbk :
I have tested on many Windows10 OS and used many versions of Python(3.8.0, 3.8.1, 3.11.2), I believe this error [RPA][ERROR] - 'utf-8' codec can't decode bytes in position 34-35: invalid continuation byte
has always existed, I think it may be because this error does not affect the final type() result, so no one raise the issue.
PS: Mr.Kensoh, I already added your FB friend, Can you pass it? Thx.
With your testing above and your original error messages below, it seems that problem happens when python process is trying to read the output from the subprocess running TagUI engine.
[RPA][3] - listening for inputs [RPA][ERROR] - 'utf-8' codec can't decode bytes in position 34-35: invalid continuation byte
I think the next step is to try the following to change code page to my 437 to see if that happens. I will try to do some testing to switch mine to see if I can replicate the problem. But I don't have a windows laptop, so it is hard for me to debug at moments when I have some time at hand when I'm outside. So will take a longer time to debug this.
https://discuss.python.org/t/choosing-correct-encoding-for-subprocess-popen/3388/3
OK, I will continue testing, Thank you very much.
With your testing above and your original error messages below, it seems that problem happens when python process is trying to read the output from the subprocess running TagUI engine.
[RPA][3] - listening for inputs [RPA][ERROR] - 'utf-8' codec can't decode bytes in position 34-35: invalid continuation byte
I think the next step is to try the following to change code page to my 437 to see if that happens. I will try to do some testing to switch mine to see if I can replicate the problem. But I don't have a windows laptop, so it is hard for me to debug at moments when I have some time at hand when I'm outside. So will take a longer time to debug this.
https://discuss.python.org/t/choosing-correct-encoding-for-subprocess-popen/3388/3
I am the same, as long as I encounter Chinese, the program will freeze
But if I use the clipboard method to paste Chinese, it can pass smoothly
But I can use the code google.txt to pass smoothly, it's great, (windows 10) thx!
Kensoh: The SikuliX engine used by rpa package does not support typing international characters.
Check this: https://github.com/tebelorg/RPA-Python/issues/451#issuecomment-1489511469
I am the same, as long as I encounter Chinese, the program will freeze
OKOK,Thx
Ok @Vic-Lau would be difficult for you to test because may need to modify subprocess call too. But any clues will be helpful. When I get time and a windows PC I will test the hypothesis whether the problem-solution works.
OK @kensoh Thank you very much.
Ok @Vic-Lau would be difficult for you to test because may need to modify subprocess call too. But any clues will be helpful. When I get time and a windows PC I will test the hypothesis whether the problem-solution works.
@Vic-Lau I tried changing code page to 936 but works:
C:\Users\kenso>chcp 936
Active code page: 936
C:\Users\kenso>python
Python 3.10.7 (tags/v3.10.7:6cc6b13, Sep 5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.device_encoding(0)
'cp936'
>>> a = '撒'
>>> a.encode('utf-8')
b'\xe6\x92\x92'
>>> a.encode('gbk')
b'\xc8\xf6'
>>> a.encode('utf-8').decode('utf-8')
'撒'
>>> a.encode('gbk').decode('gbk')
'撒'
>>>
Will next try to run the rpa package code with this code page to see if there is any error.
@Vic-Lau I can replicate the issue with code page 936:
C:\Users\kenso\Desktop>chcp
Active code page: 936
C:\Users\kenso\Desktop>python google.py
[RPA][1] - https://www.google.com
[RPA][1] - listening for inputs
[RPA][2] - exist_result = exist('//*[@name="q"]').toString()
[RPA][2] - listening for inputs
[RPA][3] - dump exist_result to rpa_python.txt
[RPA][3] - listening for inputs
[RPA][ERROR] - 'utf-8' codec can't decode bytes in position 34-35: invalid continuation byte
[RPA][4] - listening for inputs
[RPA][5] - exist_result = exist('//*[@name="q"]').toString()
[RPA][5] - listening for inputs
[RPA][6] - dump exist_result to rpa_python.txt
[RPA][6] - listening for inputs
[RPA][7] - type //*[@name="q"] as 中文
[RPA][7] - listening for inputs
Best guess now is code page 936 and the utf-8 encoding used by default in rpa package isn't 100% compatible. To check more.
@Vic-Lau, can you try to run the following from the command prompt, then run the python command on the google.py to see if it works?
chcp 437
Above will change code page to US and should work with UTF-8.
Another thing to try is change the utf-8 header in the google.py file to see if it works in your default code page.
Trying to explore different solutions to see which is the best. The other solution is having an option for rpa package to change default encoding, but will take more time to create.
By header I mean the following, in your case of Chinese Windows OS:
# -*- coding: gbk -*-
Try the 2 possible solutions separately not at the same time.
Possible solution 1, chcp 437 from command prompt Possible solution 2, change header in .py file
Hi, @kensoh Mr.Kensoh, Thank you for your reply, I've tried both solutions, all successful!!! 👍 But I think the solution 1 is better, So I think this question can be closed. Thanks again.
By the way, Share update default chcp 437 method with others who have the same problem:
1. "win + r" and type "regedit".
2. find "\HKEY_CURRENT_USER\Software\Microsoft\Command Processor".
3. create "autorun" type value "chcp 437" and save! enjoy it ~
Try the 2 possible solutions separately not at the same time.
Possible solution 1, chcp 437 from command prompt Possible solution 2, change header in .py file
Thanks @Vic-Lau !! Updated readme with these tips:
Hello @kensoh, I am Chinese, I think there is a problem with tagui for python handling of Chinese characters, For example:
r.type('//*[@name="q"]', '撒') # google search input type test, It will cause 'invalid continuation byte'.
and
r.type('D:\input.png', '中文') # chrome input png type test, It will nothing happens and script will pending.
Mr.kensoh, Can you give me some advice? I really need your help! Thank you so much!