protegeproject / protege

Protege Desktop
http://protege.stanford.edu
Other
1.01k stars 231 forks source link

Protege OWLDoc cannot save Chinese classes names correctly #1151

Open yasenstar opened 1 year ago

yasenstar commented 1 year ago

Hello all,

Try to seek help here. I create one ontology in Protege with classes name in Chinese character, Protege is the latest 5.6.2 version in Mac OS.

After I use the OWLDoc to generate the documentation, the classes folder shows the class file name in non-Chinese characters, one sample is like this:

%E4%B8%8A%E5%90%90%E4%B8%8B%E6%B3%BB___126541734.html

The home page can display the class name in Chinese correctly and the URL link to the classes is in Chinese character, but since the actual file name is not in Chinese, the navigation is not working.

Is this due to something like UTF-8 setting for Chinese characters? I couldn't find the way to configure that, please help.

Thanks, Xiaoqi

gouttegd commented 1 year ago

The problem seems to be caused by Chinese characters in the IRI, not in the label. I cannot reproduce the problem at all if I use Chinese characters only in class labels, but I do reproduce it as soon as I use them also in class IRIs.

Can you confirm your ontology is using Chinese characters in IRIs and not only in labels?

When non-ASCII characters are present in an IRI, the OWLDoc plugin converts the IRI into a URI by encoding all the non-ASCII characters with the “percent-encoding” method (leading to those '%E4%B8...' strings). The percent-encoded URI is then written in the generated HTML files as the target of a link:

<li class="asserted"><a href="TEST_%E8%89%BE%E5%85%8B%E8%88%87%E8%92%82%E5%A8%9C___1344495863.html" class='Class' title="http://purl.obolibrary.org/obo/TEST_艾克與蒂娜">艾克與蒂娜</a></li>

So far, so good: that’s the expected behaviour.

But then, the HTML file for that class is written under a filename which is also percent-encoded (i.e., TEST_%E8%89%BE%E5%85%8B%E8%88%87%E8%92%82%E5%A8%9C___1344495863.html instead of TEST_艾克與蒂娜___1344495863.html). That I believe is incorrect, and is why the browser cannot find the file.

When the browser reads TEST_%E8%89%BE%E5%85%8B%E8%88%87%E8%92%82%E5%A8%9C___1344495863.html in the href attribute of a link, it decodes the percent-encoded URI back into the original IRI (TEST_艾克與蒂娜___1344495863.html), and then looks for precisely that filename – it does not look for a percent-encoded filename.

Bottom line is that this looks like a bug in the OWLDoc plugin, not a configuration problem on your side.

yasenstar commented 1 year ago

Hi @gouttegd,

Thanks greatly for your quick support, it's exact the error I'm facing.

Yes, when I create the class name in Chinese character, the IRI is also using the Chinese character, as below the sample full IRI for class "金":

http://www.semanticweb.org/yasen/ontologies/2023/4/medica#金

As you said, the encoding step is as designed, but it's not properly decode the URI back into the original IRI, so it cannot find the file name in classes folder with Chinese characters.

Good that OWLDoc plugin can fix this as a bug.

Thanks again, Xiaoqi

yasenstar commented 1 year ago

Hi again, I record one quick video in Windows Protege (v5.6.2 as well) to demo this issue: https://youtu.be/vEaSQo3h87s for your easier review. Same situation as in Mac OS. Thanks a lot!

yasenstar commented 1 year ago

Hi there, where I can learn when this bug will be solved? Thanks.

gouttegd commented 1 year ago

Well, nobody has worked on the owldoc plugin for the past 7 years, so it’s unlikely to be fixed any time soon.