Closed andrew2net closed 1 week ago
Not sure what the error is but in the mixed content the elements like “sup” might be supposed to be “collection”.
@ronaldtse Adding the collection
option doesn't change anything. The example doesn't have the option too.
@andrew2net this will be handled by @HassanAkbar in #58 . Once #58 passes, this should work.
@andrew2net the content is not mapped to anything that is why it is not showing in the output, for mapping content in xml we need to use the map_content to: <attribute name>
.
So the above class will be changed to
module Relaton
module Model
class Docidentifier < Lutaml::Model::Serializable
attribute :type, Lutaml::Model::Type::String
attribute :content, Lutaml::Model::Type::String
attribute :scope, Lutaml::Model::Type::String
attribute :primary, Lutaml::Model::Type::Boolean
attribute :sup, Lutaml::Model::Type::String
xml do
root "docidentifier", mixed: true
map_element "sup", to: :sup
# for mapping content
map_content to: :content
map_attribute "type", to: :type
map_attribute "scope", to: :scope
map_attribute "primary", to: :primary
end
end
end
end
@HassanAkbar in the doc described same case without mapping content. Please update the doc.
It works, but I get another problem with Separate serialization model
module Relaton
module Bib
class Docidentifier
attr_accessor :type, :scope, :content
...
module Relaton
module Model
class Docidentifier < Lutaml::Model::Serializable
model ::Relaton::Bib::Docidentifier
...
We need Relaton::Bib::Docidentifier@content
to be set to "CIPM 43<sup>e</sup> réunion (1950)"
when the docidentifier
element is parsed
xml = <<~XML
<docidentifier language="en" script="Latn" locale="EN-us" type="BIPM" scope="part" primary="true">
CIPM 43<sup>e</sup> réunion (1950)
</docidentifier>
XML
docid = Relaton::Model::Docidentifier.from_xml xml
but the Relaton::Model::Docidentifier
required to have sup=
method in Relaton::Bib::Docidentifier
. After adding the sup
attribute I got:
docid.content
=> ["\n" + " CIPM 43", " r\u00E9union (1950)\n"]
docid.sup
=> ["e"]
Is there a way to solve this problem?
Looks like the content
attribute needs to be specified as mixed
? In general, we have not refined a good way of defining a mixed content model as easily done in RNC grammar. Something we need to work on.
@ronaldtse mixed content also doesn't work.
I think we can make our life more easier using Nokogiri's feature that escapes all XML markups inside text nodes.
id = Relaton::Bib::Docidentifier.new(type: "BIPM", content: "CIPM 43<sup>e</sup> réunion (1950)")
xml = Relaton::Model::Docidentifier.to_xml id
=> "<docidentifier type=\"BIPM\">CIPM 43<sup>e</sup> r\u00E9union (1950)</docidentifier>"
id2 = Relaton::Model::Docidentifier.from_xml xml
puts id2.content
CIPM 43<sup>e</sup> réunion (1950)
@andrew2net Are you asking for an option to treat XML content as text?
What kind of API are you looking for the Bib::Docidentifier
and Model::Docidentifier
object?
Can you come up with some Ruby code showing what exactly you want to see?
@ronaldtse I want to see <sup>
and <sub>
tags in BIPM IDs.
id = Relaton::Model::Docidentifier.from_xml(xml)
id.class
=> Relaton::Bib::Docidentifier
id.content
=> "CIPM 43<sup>e</sup> réunion (1950)"
But we also have LocalizedMarkedUpString which can contain many other XML tags. A lot of Relaton model's entities inherit from the LocalizedMarkedUpString
. I've tried to implement all the elements with lutaml-model and it turned out to be much more difficult than without it.
With treating the LocalizedMarkedUpString
content as text we don't need to implement all the TextElement entities. The con is that the marked up text won't be validated. If that's ok then we don't need to do anything in lutaml-model to solve this issue, and can close this issue.
@andrew2net so what should @HassanAkbar do here to fix the problem? I'm a bit confused.
So the problem is when using a separate model, the assigned content
is no longer a string but is parsed like XML?
@ronaldtse yes, content in the separated model should be
CIPM 43<sup>e</sup> réunion (1950)
but lutaml-model returns
["\n" + " CIPM 43", " r\u00E9union (1950)\n"]
We can solve this issue by handling these IDs as String. In other words the ID in XML output will be escaped:
<docidentifier type="BIPM">CIPM 43<sup>e</sup> r\u00E9union (1950)</docidentifier>
Nokogiri can encode and decode such content. I tested it and got desired content in separated model.
So we don't need to handle any tags in text element (there are a lot other tags for content defined in the grammar).
But the escaped tags aren't validated by Nokogiri. If that's ok then we can use this approach
@andrew2net Clearly we need to parse the content as a proper text model that checks its grammar/validity. Handling it as a string is a bad hack.
What is the text model there though?
Supposed it is a class called RichText
, then we can just point :content
to be casted as RichText
?
@ronaldtse docidentifier's content is LocalizedMarkedUpString which is collection of TextElement
I tried to create TextElement
but it doesn't returns string with tags
class TextElement < Lutaml::Model::Serializable
attribute :sup, :string
attribute :sub, :string
attribute :text, :string
xml do
root 'text-element', mixed: true
map_content to: :text
map_element "sup", to: :sup
map_element "sub", to: :sub
end
end
class Id
attr_accessor :id
end
class Docid < Lutaml::Model::Serializable
model Id
attribute :id, TextElement
xml do
root 'docid'
map_content to: :id
end
end
> docid = Docid.from_xml '<docid>Str<sub>2</sub>text<sup>1</suo>123</docid>'
=> #<Id:0x000000012456c500 @id=["Str", "text", "123"]>
# it returns array of strings without tags
> docid.id
=> ["Str", "text", "123"]
# tags are lost
> Docid.to_xml docid
=> "<docid>Strtext123</docid>"
What am I doing wrong?
@HassanAkbar can you help with this issue?
@andrew2net Sure, I'm looking into it right now
@andrew2net To delegate the attributes to another model we need to use the delegate
keyword.
require "lutaml/model"
require "lutaml/model/xml_adapter/nokogiri_adapter"
Lutaml::Model::Config.xml_adapter = Lutaml::Model::XmlAdapter::NokogiriAdapter
class TextElement < Lutaml::Model::Serializable
attribute :sup, :string
attribute :sub, :string
attribute :text, :string
xml do
root 'text-element', mixed: true
map_content to: :text
map_element "sup", to: :sup
map_element "sub", to: :sub
end
end
class Id
attr_accessor :id
end
class Docid < Lutaml::Model::Serializable
model Id
attribute :id, TextElement
xml do
root 'docid'
map_content to: :text, delegate: :id
map_element :sub, to: :sub, delegate: :id
map_element :sup, to: :sup, delegate: :id
end
end
> docid = Docid.from_xml '<docid>Str<sub>2</sub>text<sup>1</suo>123</docid>'
=> #<Id:0x0000000108b253f0 @id=#<TextElement:0x0000000108b250f8 @sub="2", @sup="1", @text=["Str", "text", "123"], @validate_on_set=false>>
But when we do something like Docid.to_xml docid
it crashes because of the issue you mentioned here -> https://github.com/lutaml/lutaml-model/issues/83.
I'm currently working on it and will let you know as soon as it is resolved.
@HassanAkbar it doesn't work:
> docid = Docid.from_xml '<docid>Str<sub>2</sub>text<sup>1</suo>123</docid>'
gems/ruby-3.3.3/gems/lutaml-model-0.3.10/lib/lutaml/model/serialize.rb:317:in `block in apply_xml_mapping': Attribute 'sub' not found in Docid (RuntimeError)
@andrew2net I have fixed the issue, it should work after https://github.com/lutaml/lutaml-model/pull/91 is merged.
> docid = Docid.from_xml '<docid>Str<sub>2</sub>text<sup>1</suo>123</docid>' => #<Id:0x0000000108b253f0 @id=#<TextElement:0x0000000108b250f8 @sub="2", @sup="1", @text=["Str", "text", "123"], @validate_on_set=false>>
@HassanAkbar the id
attribute is set to TextElement
instance, but it needs to be String
. In this case Str<sub>2</sub>text<sup>1</suo>123
. How can we achieve it?
@HassanAkbar the TextElement
has 20+ elements and is used in dozens cases as mixed content. This solution requires to delegate each TextElement
element in each case where it is used:
map_element :sub, to: :sub, delegate: :id map_element :sup, to: :sup, delegate: :id
We also have
PureTextElement
,NestedTextElement
, and some other, similar toTextElement
elements. This requires writing thousands delegation lines of code. Can we avoid delegating each element of mixing content?
@HassanAkbar the id attribute is set to TextElement instance, but it needs to be String. In this case
Str<sub>2</sub>text<sup>1</suo>123
. How can we achieve it?
@andrew2net this looks like a use case for raw
option for attribute which is being added in this PR -> https://github.com/lutaml/lutaml-model/pull/94. Will let you know once this is finalized.
@andrew2net @HassanAkbar
Why not just:
class TextElement < Lutaml::Model::Serializable
attribute :sup, …
end
class Docid < Lutaml::Model::Serializable
attribute :my_text_content, TextElement
end
@andrew2net @HassanAkbar
Why not just:
class TextElement < Lutaml::Model::Serializable attribute :sup, … end class Docid < Lutaml::Model::Serializable attribute :my_text_content, TextElement end
@ronaldtse the issue with this is that in xml we have different mappings for attributes
, elements
and content
. So currently we can not decide about which of the above we need to send to the my_text_content
.
I think what we can do here is, make another tag like map_raw_content
or map_all
or simply map
to map the entire tag's content as string to the attribute. What do you suggest?
the issue with this is that in xml we have different mappings for
attributes
,elements
andcontent
. So currently we can not decide about which of the above we need to send to themy_text_content
.
I don't really understand this. Can you provide an example XML with a Lutaml::Model code snippet to illustrate?
the issue with this is that in xml we have different mappings for attributes, elements and content. So currently we can not decide about which of the above we need to send to the my_text_content.
I don't really understand this. Can you provide an example XML with a Lutaml::Model code snippet to illustrate?
class TextElement < Lutaml::Model::Serializable
attribute :sup, …
end
class Docid < Lutaml::Model::Serializable
attribute :my_text_content, TextElement
end
<docid>
Str <sub>2</sub> text <sup>1</suo> 123
</docid>
@ronaldtse In this example above the xml for docid
has
and we need to define the mappings for all elements
, attributes
and content
in the docid
to map them to the proper attributes.
From what I understand from the @andrew2net example is that this needs to happen for lots of elements that have lots of tags and mapping all those tags has no benefit so mapping them to a string is sufficient.
So, what I'm proposing is adding a tag like map_all to: :raw_text
which will map all the content inside the tag to that attribute. e.g
class Docid < Lutaml::Model::Serializable
attribute :my_text_content, :string
xml do
map_all to: : my_text_content
end
end
docid = Docid.from_xml '<docid>Str<sub>2</sub>text<sup>1</suo>123</docid>'
# <Docid:0x0000000108b253f0 @my_text_content="Str<sub>2</sub>text<sup>1</suo>123", @validate_on_set=false>
we can rename map_all
to something more fitting but this is a rough idea I have in mind for this.
@HassanAkbar the last example looks good to me
@HassanAkbar Got it, then let's move ahead with map_all
. We also need to ensure that the other map_*
methods can't be used together with map_all
.
We also need to ensure that the other map_* methods can't be used together with map_all.
@ronaldtse Got it, will add a validation for that
I'm trying to use the mixed content option to parse the XML:
The class:
The result:
The text content disappears. What is wrong here?