lewismc / leonto

Leonto presents a framework for regulatory ontology construction from complex legislative documentation within the AEC Domain.
1 stars 0 forks source link

AKN XML #1

Open onestarhalfpoint opened 3 years ago

onestarhalfpoint commented 3 years ago

the original paper (McGibbney & Kumar, 2015, "A framework for Regulatory Ontology Construction within AEC Domain") states that the input files for the regulations should be in AKN XML format. However it appears that the files provided in the subfolder were feed into the ConstructJob.java as HTML, and the outputs were also different from the paper presented. Please advise. Thanks,

lewismc commented 3 years ago

Hi @onestarhalfpoint thanks for your interest in the project. This is a pretty niche area so I would be really interested to hear about your work.

...states that the input files for the regulations should be in AKN XML format.

I went back to read the chapter that Prof. Kumar and I co-authored and you are correct. The emphasis is on should. Very rarely are government bodies actually working with AKN XML a.k.a OASIS LegalDoc ML. In reality the document publishing pipelines are archaic in nature... that is to say they have not embraced structured data engineering principles which assists the kind of problem space Leonto targets. This is the reasoning that the simple ConstructJob example, is not currently configured to process AKN XML. I will say that this could however be facilitated by extending the underlying Apache Any23 library. Specifically, one could utilize the bungenix/akomantoso-lib Java library to implement a Any23 extractor. I did some of this work previously and those are the examples we presented in the paper. It was not feasible to implement a complete implementation at that time. If you are interested in this work then I could possibly revisit the topic... please let me know. Thank you lewismc

onestarhalfpoint commented 3 years ago

Hi @lewismc, thanks for confirming and clarifying, I am working on a job entails extracting knowledge from AEC regulation docs, and ran into the issues that most published document are not well structured (in HTML or even just pdf)... I found your chapter interesting but not sure if you still think AKN XML is the answer? the format seems not to have been embraced... the pipeline you proposed back then seems limited on the metadata only... On the other hand, the technologies like NLP+ontology have advanced far over the past few years. I was hoping to revisit the UK regulation dataset and gave another trial.

lewismc commented 3 years ago

Hi @onestarhalfpoint thanks for the info... interesting.

...but not sure if you still think AKN XML is the answer?

Let me answer ths question this way. Do I see a logical reason to create a new data model other than AKN XML at this stage? No not really. One of the major issues with the uptake of AKN XML is that it is complex. Like, really quite complex. It takes months (if not years) and a totally comprehensive understanding of the legislative subject manifestation e.g. AEC regulations.

... the format seems not to have been embraced

Well that's not exactly true. If you look at the Technical Work Produced by the Committee, specifically the acknowledgements of various work items, you will see numerous individuals representing numerous organizations. Loads of parliaments, libraries, law schools, military organizations, etc. In fact, the uptake would appear to be significant. AKN XML is also used pervasively across the European Commission. I think what you may mean, is that AKN XML's use is not widely advertised or is not public knowledge.

the pipeline you proposed back then seems limited on the metadata only

That may be true for that chapter. Within the final thesis we augmented the pipeline with various NLP stages which facilitated innovative use cases. A simple example was the integration of AEC regulatory models with linked building models. You mention this in the last part of your comment... and you are right.

BTW, if you are interested, we also performed A COMPARATIVE STUDY TO DETERMINE A SUITABLE REPRESENTATIONAL DATA MODEL FOR UK BUILDING REGULATIONS. Our discussion and summary is provided in Section 6, which I encourage you to read. At that time. AKN XML and the OASIS LegalDocML Techncial Committee were ultimately offered a better product. I did participate in that TC but it looks like I was never recognized in the acknowledgements. Oh well!

Let me know what you think about the above.