Closed HolQue closed 1 year ago
Hi Holger,
I don't know if a filtering is required here. In principle the whole tool chain supports UTF-8. We need to check why we run in errors here. I made this Prio 3, because we don't have a requirement for this till now. Nevertheless it should be fixed.
Thank you, Thomas
Hi Thomas,
the problem is not UTF-8 itself, but 4 Byte characters together with older language versions of database management systems. Therefore the filtering is required. The database is the problem, not the UTF-8 handling inside our Python sources.
Hi Holger,
O.K. got the point.
Thank you, Thomas
Hi Thomas, Hi Holger,
This issue requires a change in https://github.com/test-fullautomation/testresultwebapp insteads of this repo. So I moved this issue to testresultwebapp.
Thank you, Ngoan
I am not sure.
The replacement of 4 Byte characters has to happen before the SQL interface is accessed!
Inside RobotLog2DB\CDataBase.py
/ __arExec()
/ c.execute(command,values)
: values
must contain the already converted content - otherwise crash.
I gave it a try with this quick&dirty hack in CDataBase.py
:
For every value:
if isinstance(value, str):
value = re.sub('[\U00010000-\U0010ffff]', '\ufffd', value)
'\ufffd
' is a question mark within a black diamond; it is used as a replacement character to indicate problems when a system is unable to render a stream of data to a correct symbol.
Input in XML result file of Robot Framework (with last character is a 4 Byte one):
<meta name="project">Ä.ß.€.考.𠼭</meta>
With the hack RobotLog2DB
is able to handle this:
Created test execution result for variant 'Ä.ß.€.考.𠼭' - version 'QT-VSW-050' successfully: 7741aea1-10b0-4d09-aabc-c8ebab7acc6b
The name of this variant is displayed properly in dashboard variant selection list of test database.
Entire __arExec()
hack:
try:
regexHighPoints = re.compile('[\U00010000-\U0010ffff]')
except re.error:
# UCS-2 build
regexHighPoints = re.compile('[\uD800-\uDBFF][\uDC00-\uDFFF]')
# '\ufffd' is a question mark within a black diamond; it is used as a replacement character to indicate problems
# when a system is unable to render a stream of data to a correct symbol.
# The byte in the middle of the three bytes that are needed to code this character, is not valid for UTF-8.
# Therefore the literal cannot be used within this file.
# The unicode escape sequence \ufffd is used here instead.
values_new = []
if values is not None:
for value in values:
if ( (value is not None) and (isinstance(value, str)) ):
value = regexHighPoints.sub('\ufffd', value)
values_new.append(value)
c.execute(command,values_new)
else:
c.execute(command,values)
Hi Ngoan, can you take this over to all log2db tools? Please confirm when done... Thank you, Thomas
Hi Holger, Hi Thomas
Thank a lot for Holger's investigations and proposal! They are very useful for me to approach the solution for this issue. :)
But my idea is our tool chain (import tools, database and WebApp) will support 4 byte characters instead of masking as question mark.
The feature requires changes of charset in our database and all connections which communicate with db (webapp, import tools). It means that the 4 byte chars will be maintained from the result file (command line argument, config json file) to database and would be displayed properly on WebApp as below image:
Here is all changes that I have successfully tried on my system:
Please give your idea about this issue. Then I will try with the testdb then create all related pull requests.
Thank you,
Hi Ngoan,
a really good solution! And much better than filtering out critical characters.
I like!
Hi Ngoan,
thank you for further investigation.
We implement your proposal.
Thank you, Thomas
solved with #18
Problem: Database management systems with older language versions are not able to handle 4 byte characters.
Details in attached PDF. Database4ByteLimitation.pdf