laws-africa / cobalt

A lightweight python library for working with Akoma Ntoso documents.
https://cobalt.readthedocs.io/en/latest/
Other
15 stars 1 forks source link

Support other AKN document types #18

Closed longhotsummer closed 4 years ago

longhotsummer commented 4 years ago

The goal is to ensure Cobalt can provide at least basic support for arbitrary Akoma Ntoso document types (section 5 of the spec).

The AKN model includes a number of document types with specific root elements and document models. Cobalt's object hierarchy should mimic these document types, making it simpler to map between AKN and Cobalt.

A summary of the document types, primary elements, primary body elements etc. is in https://docs.google.com/spreadsheets/d/1Y9DJu0-IaINyhCjRQRtIQLmsadtLVFlqidI4gy_OAWA/edit#gid=2028639471

Proposal

  1. A common AkomaNtoso document base class
  2. Subclasses for each type of document structure (hierarchicalStructure, judgmentStructure, etc.), with structure-specific functionality
  3. Concrete subclasses for each document type (act, bill, judgment, etc.), with document-specific functionality

Cobalt should provide factory code that takes an XML tree (or text) and determines what type it is, then instantiates the appropriate class.

Example

This could look something like this:

class AkomaNtosoDocument:
    """ Base class for Akoma Ntoso documents.
    """

# ----------------------------------------------------------------------------------
# Document structure classes

class StructuredDocument(AkomaNtosoDocument):
    """ Common base class for AKN documents with a known document structure.
    """

class AmendmentStructure(StructuredDocument):
    structure_type = "amendmentStructure"
    main_content_element = "amendmentBody"

class CollectionStructure(StructuredDocument):
    structure_type = "collectionStructure"
    main_content_element = "collectionBody"

class DebateStructure(StructuredDocument):
    structure_type = "debateStructure"
    main_content_element = "debateRecord"

class HierarchicalStructure(StructuredDocument):
    structure_type = "hierarchicalStructure"
    main_content_element = "body"

class JudgmentStructure(StructuredDocument):
    structure_type = "judgmentStructure"
    main_content_element = "judgmentBody"

class OpenStructure(StructuredDocument):
    structure_type = "openStructure"
    main_content_element = "mainBody"

class PortionStructure(StructuredDocument):
    structure_type = "portionStructure"

# ----------------------------------------------------------------------------------
# Document type classes

class Act(HierarchicalStructure):
    document_type = "act"

class Amendment(AmendmentStructure):
    document_type = "amendment"

class AmendmentList(CollectionStructure):
    document_type = "amendmentList"

class Bill(HierarchicalStructure):
    document_type = "bill"

class Debate(DebateStructure):
    document_type = "debate"
longhotsummer commented 4 years ago

A very hacky initial version of this is in the akn3-doc-types branch: https://github.com/laws-africa/cobalt/tree/akn3-doc-types

goose-life commented 4 years ago

@longhotsummer will 'subtypes' still be used by indigo, e.g. /act/ordinance, or will this become /ordinance?

longhotsummer commented 4 years ago

Subtypes are still very much a thing: https://docs.oasis-open.org/legaldocml/akn-nc/v1.0/os/akn-nc-v1.0-os.html#_Toc531692270

Any specification of document subtype, if appropriate. For an Akoma Ntoso XML representation, this value MUST correspond to the content of the element in the metadata or, in its absence, to the “name” attribute of the document type (optional).

Just realised we don't use FRBRsubtype. Will make an issue for AKN3.

longhotsummer commented 4 years ago

There are some old APIs which we can take this opportunity to improve/adjust.

  1. the _maker attribute is useful externally, it shouldn't be private with _. So _maker becomes maker.
  2. to_xml should take *args and **kwargs** and pass them on to etree's tostring method.
  3. The helper methods such as year, number etc. that simply proxy to the FRBR URI should be removed. Callers can use .frbr_uri.year etc. instead.
  4. _ensure is useful externally, remove the _ and make it ensure_element
  5. _make is useful externally, remove the _ and make it make_element
  6. _get is useful externally, remove the _ and make it get_element

Items 1, 4, 5 and 6 can be on the root AkomaNtosoDocument class since they're unrelated to the content of the document.

longhotsummer commented 4 years ago

To be namespace safe, we should always use xml namespaces when working with nodes. So lines like self.root.iterfind('./{*}components/{*}component/{*}doc'): should use the actual namespace, not *.

goose-life commented 4 years ago

The helper methods such as year, number etc. that simply proxy to the FRBR URI should be removed. Callers can use .frbr_uri.year etc. instead.

year does a little more: it gives the year from frbr_uri.date. Shall we just nuke number (and nature, and have people use .frbr_uri.doctype)?

longhotsummer commented 4 years ago

The helper methods such as year, number etc. that simply proxy to the FRBR URI should be removed. Callers can use .frbr_uri.year etc. instead.

year does a little more: it gives the year from frbr_uri.date. Shall we just nuke number (and nature, and have people use .frbr_uri.doctype)?

Since year comes from the FRBR URI, I still think it should be moved there. Anything that's a simply proxy to the frbr_uri should be removed. Otherwise it clutters the API.

goose-life commented 4 years ago

closed by #23