renggli / dart-xml

Lightweight library for parsing, traversing, and transforming XML in Dart.
http://pub.dartlang.org/packages/xml
MIT License
223 stars 52 forks source link

Parsing HTML tags removing tags ? #127

Closed RageshAntony closed 2 years ago

RageshAntony commented 2 years ago

In the string

<person type="selected">Ragesh <b> Selected </b></person> When parsed , it returns

<person type="selected">Ragesh Selected </person> I need in unaltered state as same as input xml

also , I tried replacing as&lt; Selected &gt;, but getting

Error: XmlParserException: "<" expected at 1:1 at Object.throw_ [as throw] (http://localhost:37815/dart_sdk.js:5063:11) at Function.parse (http://localhost:37815/packages/xml/src/xml/utils/node_list.dart.lib.js:2485:19) at gsheets_repo.GSheetsRepo.new.parseXMLToSheets (http://localhost:37815/packages/xml_file_translator/app/data/repo/gsheets_repo.dart.lib.js:131:46) at parseXMLToSheets.next () at http://localhost:37815/dart_sdk.js:40192:33 at _RootZone.runUnary (http://localhost:37815/dart_sdk.js:40062:59) at _FutureListener.thenAwait.handleValue (http://localhost:37815/dart_sdk.js:34983:29) at handleValueCallback (http://localhost:37815/dart_sdk.js:35551:49) at Function._propagateToListeners (http://localhost:37815/dart_sdk.js:35589:17) at _Future.new.[_completeWithValue] (http://localhost:37815/dart_sdk.js:35437:23) at async._AsyncCallbackEntry.new.callback (http://localhost:37815/dart_sdk.js:35458:35) at Object._microtaskLoop (http://localhost:37815/dart_sdk.js:40330:13) at _startMicrotaskLoop (http://localhost:37815/dart_sdk.js:40336:13) at http://localhost:37815/dart_sdk.js:35811:9

please help me

renggli commented 2 years ago

Can you provide a minimal reproducible example?

The following example works as expected:

final document = XmlDocument.parse('<person type="selected">Ragesh <b> Selected </b></person>');
print(document); // prints '<person type="selected">Ragesh <b> Selected </b></person>'
RageshAntony commented 2 years ago

I tried replacing as&lt; Selected &gt;, but getting

Yeah . The issue is when getting personNode.text by iterating the elements

The value of personNode.text returns as "Ragesh selected"

But when printing whole xml , it's correct

Code:

final document = XmlDocument.parse(enController.text.trim());
    print ("PARSED"+ document.toXmlString()); // correct
        final persons = document.findAllElements('person');

    persons.forEach((personNode) {
      print ("PARSED TEXT ${personNode.text}"); // no tags !
    });
renggli commented 2 years ago

text only returns the textual contents, no tags. If you want the XML contents you probably should call innerXml instead?

RageshAntony commented 2 years ago

text only returns the textual contents, no tags. If you want the XML contents you probably should call innerXml instead?

Yeah working with innerXml;

But , when rebuilding a XML ,this tags converted to &lt; &gt; also not very accurate

Ragesh &lt;b> selected &lt;/ b>

          builder.element("person", attributes: {"name": key}, nest: () {
            final allColsIndex = allCols.indexOf(langCode);
            print ("TRANS_STRING ${transString![allColsIndex]}"); // => Ragesh <b> Selected </b> 
          builder.text(transString![allColsIndex]); // => Ragesh &lt;b> selected &lt;/ b>

          });
renggli commented 2 years ago

text creates a XML text node and encodes all the input. If you don't want to encode the contents, but instead parse it use xml (along innerXml/outerXml accessors).

RageshAntony commented 2 years ago

With the XmlBuilder if you don't want to encode but parse the input use xml.

gettting this error :

Unhandled Exception: XmlParserException: Expected name at 1:16

      builder.element("person", attributes: {"name": key}, nest: () {
        final allColsIndex = allCols.indexOf(langCode);
        print ("TRANS_STRING ${transString![allColsIndex]}"); // => Ragesh <b> Selected </b> 
      builder.xml(transString![allColsIndex]); // => Exception 

      });
renggli commented 2 years ago

As the error says, your XML does not parse. Again it would help, if your example was minimal. This works for me:

final builder = XmlBuilder();
builder.element('person', nest: () {
  builder.xml('Ragesh <b> Selected </b> ');
});
print(builder.buildDocument());
RageshAntony commented 2 years ago

As the error says, your XML does not parse. Again it would help, if your example was minimal. This works for me:

final builder = XmlBuilder();
builder.element('person', nest: () {
  builder.xml('Ragesh <b> Selected </b> ');
});
print(builder.buildDocument());

Sorry for late reply

The strings from translation API had misaligned html tags so parising failed . Problem in Google Translate API which destroys html tags !!

OK . An another question

how to add this tags like this, when building an XML ?

<!DOCTYPE resources [
    <!ENTITY personTeam "ImperialBlue">
    ]>

That is need to add a doctype with an entity type to be used as , &personTeam; ..

renggli commented 2 years ago

Presumably you want to encode your HTML using CDATA tags, this would avoid that any of the contents is parsed as XML (which is generally not possible for HTML).

To my own surprise the method XmlBuilder.doctype was missing. Added that with fc6593a.

RageshAntony commented 2 years ago

Presumably you want to encode your HTML using CDATA tags, this would avoid that any of the contents is parsed as XML (which is generally not possible for HTML).

To my own surprise the method XmlBuilder.doctype was missing. Added that with fc6593a.

Thanks . Easy to use library for flutter users. Great work !

When this update available in pub.dev ?

(Also please add methods to add additional infos,tags and entities in doctype . And also add some documentation

renggli commented 2 years ago

Before I publish a version on pub.dev I'd like to properly model the doctype node, i.e. parse the attributes and entities. This likely will be a breaking change with fc6593a. In the meantime I suggest you depend directly on the github version.

renggli commented 2 years ago

Unfortunately this is a massive project with little benefit to the average user (could also be done as part of an extension to this package). Please contribute.

RageshAntony commented 2 years ago

Unfortunately this is a massive project with little benefit to the average user (could also be done as part of an extension to this package). Please contribute.

Yeah . I am eager to contribute, but unfortunately I don't have knowledge in XML. Even for this project , I read some tuts to understand XML

Thanks for your request

renggli commented 2 years ago

Yeah same here. I spent a couple of hours reading about how DTDs work: very complicated, little used (?), and few XML libraries actually implement it.

RageshAntony commented 2 years ago

@renggli

Hi How are you ?

how to parse and re add this doctype tags like this, when building an XML to XML ?

This Doctype exists in source XML

<!DOCTYPE resources [
    <!ENTITY personTeam "ImperialBlue">
    ]>

Need to add the same above in targeted processed XML

Since doctype functions not in library , Any workaround possible to re add them as requested above ?

it's urgent .. Please !

renggli commented 2 years ago

Not sure why you say doctype functions are not part of the library, there is plenty of basic support for reading and writing doctype: https://github.com/renggli/dart-xml/search?q=doctype. The only support that is lacking is to be able to manipulate the inside parts with dedicated objects.

RageshAntony commented 2 years ago

Oh okay

Will you please give me a sample code about processing and writing a Doctype

On Fri, 11 Mar, 2022, 2:05 PM Lukas Renggli, @.***> wrote:

Not sure why you say doctype functions are not part of the library, there is plenty of basic support for reading and writing doctype: https://github.com/renggli/dart-xml/search?q=doctype. The only support that is lacking is to be able to manipulate the inside parts with dedicated objects.

— Reply to this email directly, view it on GitHub https://github.com/renggli/dart-xml/issues/127#issuecomment-1064890998, or unsubscribe https://github.com/notifications/unsubscribe-auth/AITNQ5Z7LGRVHPCDSK3JLM3U7MAV7ANCNFSM5LR44FZQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were assigned.Message ID: @.***>

renggli commented 2 years ago

Please have a look at the search query I gave above. There are plenty of test-cases, i.e. for reading and writing.

RageshAntony commented 2 years ago

Please have a look at the search query I gave above. There are plenty of test-cases, i.e. for reading and writing.

Well . I already parsed the doctype . But as mentioned above in old reply , builder.doctype function is missing .

That' why I asking ?

You instructed to use the Github version . Which branch ? main or other ?

RageshAntony commented 2 years ago

@renggli

Okay . I Wrote like this . But a issue occurs :

input

<!DOCTYPE resources [
    <!ENTITY personTeam "ImperialBlue">
    ]>

reading parsedDocType = document.doctypeElement;

Writing: builder.text(parsedDocType?.toXmlString() ?? "");

Output &lt;!DOCTYPE resources [ &lt;!ENTITY appname "Glowify"> &lt;!ENTITY p1 "AIzaSyCF"> &lt;!ENTITY p2 "mv8ZiK30hmq_"> &lt;!ENTITY p3 "eq4wGST8liXyahmE7I0"> ]>

How to get unaltered output ?

renggli commented 2 years ago

builder.text creates an XML text node, you want a raw string.

RageshAntony commented 2 years ago

@renggli Okay .

builder.xml(parsedDocType?.toXmlString() ?? "");

working as expected . Thanks