redhat-developer / vscode-xml

Editing XML in Visual Studio Code made easy
Eclipse Public License 2.0
262 stars 82 forks source link

Add support for auto-completion of XML entities #876

Open tbazant opened 1 year ago

tbazant commented 1 year ago

In our documentation, we include XML entities via the following markup:

[
  <!ENTITY % entities SYSTEM "generic-entities.ent">
    %entities;
]>

The file generic-entities.ent contains entities, but also include other entity files as well via the following syntax:

<!ENTITY % product-entities     SYSTEM      "product-entities.ent">
%product-entities;

Would it be possible to parse these files for entities and offer their complete and sorted list as an intellisense feature after entering &?

angelozerr commented 1 year ago

Have you tried to enable resolve entities https://github.com/redhat-developer/vscode-xml/blob/main/docs/Validation.md#resolve-external-entities

tbazant commented 1 year ago

Yes, i have resolving entities enabled like this:

"xml.validation.resolveExternalEntities": true,
"xml.validation.xInclude.enabled": true,

See what happens if i trigger the autocompletion with &: image

Strange thing is that in the DocBook code, entities are not underlined as non-resolved, but the PROBLEMS console still complains about it: image

angelozerr commented 1 year ago

XML entities completion has been implemented, with simple usecase.

Could you share your XML please and give us in detail what you expect (completion, validation behavior).

angelozerr commented 1 year ago

See what happens if i trigger the autocompletion with &:

Completion on entites is working, no?

tbazant commented 1 year ago

In our docs, entities are defines followingly:

[
  <!ENTITY % entities SYSTEM "generic-entities.ent">
    %entities;
]>

but when writing & only generic XML entites are shown in the list + &entities; which is wrong. &entities; should be resolved into real entities contained in the generic-entities.ent file.

tbazant commented 1 year ago

In fact, entities that are included in the file itself are working, but entities included from an external file are not.

tbazant commented 1 year ago

a good example is https://github.com/SUSE/doc-sle/blob/testing-vscode/xml/apache2.xml it has a reference to an entity file at the top, and also include another XML file apache2_yast_i.xml at line 727. Please note that the included file is not correctly validated and reports errors: image

tomschr commented 1 year ago

As an addition, the XML file loads this entity file. However, that file loads other files as well. What Tomas shared, looks like this from the file sharing perspective:

apache2.xml
|
+-- generic-entites.ent
     |
     +-- product-entities.ent
     +-- network-entities.ent
     |
     +-- [... declares many entities ... ]
     |
     +-- %dbcent;

What's interesting, inside of generic-entities.ent it contains the reference %dbcent; to the DocBook entities. This loads the entities by using a public/system identifiers:

<!ENTITY % dbcent PUBLIC
    "-//OASIS//ENTITIES DocBook Character Entities V4.5//EN"
    "http://www.oasis-open.org/docbook/xml/4.5/dbcentx.mod">
%dbcent;

You can see that in the lines 532-533

Maybe there is some problems with public identifier resolution through XML catalog?

angelozerr commented 1 year ago

In fact, entities that are included in the file itself are working, but entities included from an external file are not.

Indeed I can reproduce the problem with your sample.

Please note that the included file is not correctly validated and reports errors:

Indeed I can reproduce the problem with your sample.

Maybe there is some problems with public identifier resolution through XML catalog?

Where can I can find the XML catalog?

tomschr commented 1 year ago

Thanks Angelo for your help.

Where can I can find the XML catalog?

I fear, this is hidden in the docbook_4 package on openSUSE. Unfortunately, GitHub doesn't allow me to upload XML files directly. As such, I've renamed it to .xml.txt and that worked:

docbook_4.xml.txt

There you can find the public identifier for the "DocBook Character Entities":

<delegatePublic publicIdStartString="-//OASIS//ENTITIES DocBook Character Entities V4.5//EN" catalog="file:///usr/share/xml/docbook/schema/dtd/4.5/catalog.xml"/>

That leads to another rule which basically resolves dbcentx.mod to /usr/share/xml/docbook/schema/dtd/4.5/dbcentx.mod.

Does that help?

angelozerr commented 1 year ago

Thanks so much @tomschr for your explanation. I fear your usecase is a complex usecase and it seems that there are a lot of problem. I need to find time to investigate each problems.

tomschr commented 1 year ago

Yeah, sorry for the complexity, but as far as I can tell, it's legal XML. :slightly_smiling_face: Our parser never complained.

Sure, take your time. If I can support you with more details etc., just ping me.

Thanks for your time and effort, much appreciated!

tbazant commented 1 year ago

My opinion is that the DocBook entities rsolvable via %dbcent; would be fine, but populating custom entities from the included files such as generic-entities.ent and the ones that generic-entities.ent includes (can be probably recursive) has higher priority in actual work.

angelozerr commented 1 year ago

Yeah, sorry for the complexity, but as far as I can tell, it's legal XML. 🙂 Our parser never complained.

No please don't apologize, , it is very nice to have complex usecases to improve lemminx / vscode-xml and we need to fix it, but I think it will take some time. Thanks again to report this issue.

In this issue there are 2 problems:

My opinion is that the DocBook entities rsolvable via %dbcent; would be fine, but populating custom entities from the included files such as generic-entities.ent and the ones that generic-entities.ent includes (can be probably recursive) has higher priority in actual work.

I need to investigate some times to support that.

tbazant commented 1 year ago

JFYI, I've noticed that XML entities that are included from the *.ent files (no matter how deep) are recognized by vscode-xml - because they are not reported as not declared. While really undeclared entities are correctly underlined with clear error statement.