lutaml / oasis-etm

Ruby library for OASIS Exchange Table Model (a subset of CALS table)
0 stars 1 forks source link

Fix round-tripping specs #1

Open ronaldtse opened 3 weeks ago

ronaldtse commented 3 weeks ago

Failures:

Oasis::Etm
  XML round-trip conversion
    native XML
      with file native/docbook_example.xml
        performs lossless round-trip conversion (FAILED - 1)
    namespaced XML (ISOSTS)
      with file isosts/isosts_tables.cals.01.xml
        performs lossless round-trip conversion (FAILED - 2)
      with file isosts/isosts_tables.cals.02.xml
        performs lossless round-trip conversion (FAILED - 3)
      with file isosts/isosts_tables.cals.03.xml
        performs lossless round-trip conversion (FAILED - 4)
      with file isosts/isosts_tables.cals.04.xml
        performs lossless round-trip conversion (FAILED - 5)
      with file isosts/isosts_tables.cals.05.xml
        performs lossless round-trip conversion (FAILED - 6)
      with file isosts/isosts_tables.cals.06.xml
        performs lossless round-trip conversion (FAILED - 7)
      with file isosts/isosts_tables.cals.07.xml
        performs lossless round-trip conversion (FAILED - 8)
      with file isosts/isosts_tables.cals.08.xml
        performs lossless round-trip conversion (FAILED - 9)
      with file isosts/isosts_tables.cals.09.xml
        performs lossless round-trip conversion (FAILED - 10)
      with file isosts/isosts_tables.cals.10.xml
        performs lossless round-trip conversion (FAILED - 11)
      with file isosts/isosts_tables.cals.11.xml
        performs lossless round-trip conversion (FAILED - 12)
      with file isosts/isosts_tables.cals.12.xml
        performs lossless round-trip conversion (FAILED - 13)
    namespaced XML (NISO JATS)
      with file niso-jats/niso-jats-table-wrap.xml
        performs lossless round-trip conversion (FAILED - 14)

The round-tripping specs are failing for a few reasons:

  1. The isosts tables use a namespace of xmlns:oasis="-//OASIS//DTD XML Exchange Table Model 19990315//EN". By default the Table class does not use any namespace. We need to update the specs (with a wrapping class) to work with this additional namespace.
  2. The niso-jats tables use a namespace of xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table". We need to update the specs (new wrapping class) to parse this.
  3. The native/ file is failing as element order is necessary in the table. For text cells, we need to apply mixed: true. For elements with order sensitive children, apply the order: true option.
ronaldtse commented 3 weeks ago

We may need to use the raw: true option for the Entry element because we don't know what's going to go in there.

These are enhancements we need in lutaml-model:

  1. We need a way to allow users to adopt this gem by applying a namespace around it.
  2. We need a way for users of this gem to "swap out" the Entry class with their own "SpecialEntry" so they can process the content of Entry, because only the user knows what goes into there, the generic oasis-etm class won't ever know.

Ping @HassanAkbar @suleman-uzair for thoughts.

ronaldtse commented 3 weeks ago

So I fixed the "native round-trip spec" but there is a problem with <entry>.

In the XML file, it contains this content (the spec passes if I comment that line out):

<row>
  <entry>b1</entry>
  <entry>b2</entry>
  <entry>b3</entry>
  <entry>b4</entry>
  <entry morerows='1' valign='middle'><para>  <!-- Pernicous Mixed Content -->
  Vertical Span</para></entry>
</row>

The element <entry ...><para>...</para></entry> fails because the <para>...</para> content is not defined in lutaml-model, and therefore omitted from the re-generated content.

This is a case where the content of an element is not defined, and we need to implement some pass-through.

Clearly, map_all won't work here because we still need to retain the attributes.

So I thought of using raw: true:

module Oasis
  module Etm
    class Entry < Lutaml::Model::Serializable
      attribute :colname, :string
      attribute :content, :string, raw: true
      # ...

      xml do
        root "entry"

        # Attribute mappings
        map_attribute "colname", to: :colname
        # ...

        # Content mapping
        map_content to: :content
      end
    end
  end
end

But it still doesn't work.

We need to handle this case gracefully because it is a clear requirement.

ronaldtse commented 3 weeks ago

The above problem depends on this fix:

ronaldtse commented 3 weeks ago

Ping @HassanAkbar @suleman-uzair for comments.

Novtopro commented 3 weeks ago

@ronaldtse There're only two specs keep failing now, but shouldn't they be put in niso-jats or docbook gem?

CleanShot 2024-11-04 at 10 58 40@2x

ronaldtse commented 3 weeks ago

So different XML schemas use the same OASIS table XML, but use different namespaces. The tests from "isosts" are also from "isosts", which is an older XML schema that adopted the OASIS table XML.

I suspect that we should allow an easy way to accept different namespaces?

Novtopro commented 3 weeks ago

@ronaldtse Yeah, we need to change lutaml-model

Novtopro commented 3 weeks ago

@ronaldtse @HassanAkbar It's like we may need something like

CleanShot 2024-11-04 at 16 33 50@2x

in lutaml-model

HassanAkbar commented 3 weeks ago

@Novtopro Good idea. This might work, I'll try working on this and see if it affects other specs or not.

Actually, I think we are doing it for the first level, not sure about passing it down to other elements. I'll check this and get back to you.