jgraph / drawio

draw.io is a JavaScript, client-side editor for general diagramming.
https://www.drawio.com
Other
81 stars 28 forks source link

Double UTF-8 encoding when exporting to github #2431

Closed JensLincke closed 3 years ago

JensLincke commented 3 years ago

Preflight Checklist

Describe the bug

Using https://www.diagrams.net/ I created a drawing and saved it in github as architecture.drawio

When displaying in in a website via embedding the text encoding is weired

image

Potential BUG: double UTF-8 encoding...

$file architecture.drawio
architecture.drawio: UTF-8 Unicode text, with very long lines

$cat architecture.drawio | grep custom |  cut -c -120
        <mxCell id="iHkALPmoCXANNeq6opZh-39" value="+ process(code, &#xa;    annotations, &#xa;   �
        <mxCell id="iHkALPmoCXANNeq6opZh-46" value="+ onmessage(code, &#xa;    annotations, &#xa;    c
        <mxCell id="iHkALPmoCXANNeq6opZh-50" value="&lt;div style=&quot;font-size: 5px&quot;&gt;&lt;br&gt;&lt;/div&gt;&l
        <mxCell id="iHkALPmoCXANNeq6opZh-98" value="&lt;span class=&quot;hljs-comment&quot; style=&quot;color: rgb(150,

And by converting it from UTF-8 to Latin1 it will be displayed correctly in a UTF8 terminal:

cat architecture.drawio | iconv -f UTF-8 -t LATIN1 | grep custom |  cut -c -120
        <mxCell id="iHkALPmoCXANNeq6opZh-39" value="+ process(code, &#xa;    annotations, &#xa;    customInstanc
        <mxCell id="iHkALPmoCXANNeq6opZh-46" value="+ onmessage(code, &#xa;    annotations, &#xa;    customInstanc
        <mxCell id="iHkALPmoCXANNeq6opZh-50" value="&lt;div style=&quot;font-size: 5px&quot;&gt;&lt;br&gt;&lt;/div&gt;&l
        <mxCell id="iHkALPmoCXANNeq6opZh-98" value="&lt;span class=&quot;hljs-comment&quot; style=&quot;color: rgb(150,

So I assume there is a double utf-8 encoding happening somewhere?

To Reproduce Steps to reproduce the behavior:

  1. edited figugure https://app.diagrams.net/#HLivelyKernel%2Flively4-core%2Fdrawio%2Fsrc%2Fbabylonian-programming-editor%2Farchitecture.drawio

  2. Visit same file on gitbub https://github.com/LivelyKernel/lively4-core/blob/gh-pages/src/babylonian-programming-editor/architecture.drawio

  3. See weired encoding on line 1000 image

Expected behavior

I would expect to see "tabs" or other unicode there, but not encoding errors.

draw.io version (In the Help->About menu of the draw.io editor):

Desktop (please complete the following information):

alderg commented 3 years ago

For plain text labels, we turn a tab keypress into 4 * \xa0 (non-breaking space character), which is what you see in the output. All other unicode chars (such as umlauts) are correct in the output. On the terminal in macOS, the output looks like this for me:

$ cat architecture.drawio.xml| grep custom | cut -c -120 <mxCell id="iHkALPmoCXANNeq6opZh-39" value="+ process(code, &#xa;    annotations, &#xa;    customInstances, &#xa <mxCell id="iHkALPmoCXANNeq6opZh-46" value="+ onmessage(code, &#xa;    annotations, &#xa;    customInstances, &# <mxCell id="iHkALPmoCXANNeq6opZh-50" value="&lt;div style=&quot;font-size: 5px&quot;&gt;&lt;br&gt;&lt;/div&gt;&l <mxCell id="iHkALPmoCXANNeq6opZh-98" value="&lt;span class=&quot;hljs-comment&quot; style=&quot;color: rgb(150,

Embedding also works for me. Could you add a link to a test case please?

JensLincke commented 3 years ago

maybe this is a linux issue... but I see this in a standard WSL (windows subsystem for linux) image and in xterm... image and I see this in chrome when viewing the raw file on github: image @alderg , did you check the file out with git and in got automatically repaired in MacOS?

alderg commented 3 years ago

How did you insert those characters?

JensLincke commented 3 years ago

I used https://app.diagrams.net to load a file from github and save it back https://app.diagrams.net/#HLivelyKernel%2Flively4-core%2Fdrawio%2Fsrc%2Fbabylonian-programming-editor%2Farchitecture.drawio

alderg commented 3 years ago

What key did you press to insert those characters?

JensLincke commented 3 years ago

Sorry, I found the issue on my side:

using https://app.diagrams.net/#HJensLincke%2Fdrawio-test%2Fdrawio%2Ftest-chars.drawio

to edit https://github.com/JensLincke/drawio-test/blob/drawio/test-chars.drawio

I used a german keyboard and pasted the unicode through the clipboard

This is sample text:
this a tab...
öäüß german umlauts and sz
→and a unicode error

image

and the umlauts and unicode char seems to correctly end up in the drawio branch .... image

And the bad conversion happend when copying it to the main branch... image