Closed ronaldtse closed 6 years ago
@ronaldtse Ok, I will do it a few days late. Have to finish some work.
As a consequence, the from_xml routine needs to be robust and tested for each biblio class.
if we need to recreate an item from XML then we should store the class name of the item in the cache. I see 2 ways how to do this:
When a processor registering in Relation then it should store class name
module Relaton
module Isobib
class Processor < Relaton::Processor
def initialize
@short = :isobib
@class_name = 'IsoBibItem'
@prefix = "ISO"
@defaultprefix = %r{^(ISO)[ /]|^IEV($| )|^IEC 60050}
@idtype = "ISO"
end
So in Relaton we can associate class with item and save the class name in the cache.
Return from IsoBib, GbBib etc, objects instead of XML strings. Then we could get class name from the objects and save the class name with serialized objects in XML.
Any thoughts?
@andrew2net we do not want to store the "class name" because those are subject to change.
The prefix
(or author information) will already tell us what class we need to instantiate, right?
@ronaldtse Not right. Caching is in Relaton gem. Other gems like IsoBib, GbBib registers themselves in Relaton and provide prefix and method to get item. The prefix allows recognizing related reference. So we can get an item in XML format and store it in the cache. But when we get the item from the cache we don't know which class use to create the item's object from XML. If we coding association prefix = class in Relaton then we can't register new gems in Relaton without changing the code. So a gem, which registers in Relaton should provide a class name.
@andrew2net this must be a misunderstanding. There should not be a class name inside this cache. This is not a binary cache, it is a bibliographic entry that will be stored in XML format.
In order to know which class to instantiate from the XML, the "type" of the object is stored in the XML itself. For example, if the entry is:
<bibdata type="uri:calconnect.org:documents:standard"> <==== the type that tells you
<title language="en" format="plain">Guidelines to thwart calendar abuse for calendaring and mail system operators</title>
<docidentifier>CD 18XX</docidentifier>
<contributor>
<role type="author"/>
<organization>
<name>CalConnect</name>
</organization>
</contributor>
<contributor>
<role type="publisher"/>
<organization>
<name>CalConnect</name>
</organization>
</contributor>
<language>en</language>
<script>Latn</script>
<status format="plain">working-draft</status>
<copyright>
<from>2018</from>
<owner>
<organization>
<name>CalConnect</name>
</organization>
</owner>
</copyright>
<editorialgroup>
<technical-committee>CALSPAM</technical-committee>
</editorialgroup>
</bibdata>
We can infer the type from the "type" attribute.
cc: @opoudjis (see how the type incorporates the namespace)
Eh.... the type currently does not incorporate a namespace; it is just a token like "standard". And I don't think it should have a namespace.
The class is being inferred from the document identifier prefix by default, but that will not generalise in all cases (particularly GB). The author or publisher contributor could be used as well, but that leads to a complex and fallible rules engine.
Hate to say this, but the cleanest way to address this is to add a new top-level attribute to all bibdata retrieved from relaton, such as "source", naming the class it was derived through. @ronaldtse, is that OK?
@ronaldtse we have to store also fetched
date in the cache, so I use yaml
files
---
fetched: 2018-10-01
bib: |-
<bibitem type="standard" id="GB/T20223">
<title format="text/plain" language="zh" script="Hans">棉短绒</title>
<title format="text/plain" language="en" script="Latn">Cotton linter</title>
...
It's not problem to add class_name
attribute
@opoudjis I think the easiest way would be adding to_xml
method to the processor interface. What do you think?
I strongly urge the usage of some type or namespace. For example, <bibdata type="uri:calconnect.org:documents:standard">
totally makes sense and is not an xmlns
(which everyone hates).
We can store the fetched
date in the <bibitem>
as well, because it is valid (in citations we often have "last accessed" date too).
And we want the files to be usable as standalone files as well.
So we need 2 new attributes in bibdata (class_name and fetched), do we?
I think fetched should be an element, and class-name should really be 'type' for the “uri:...”
Done it.
@opoudjis I added method from_xml
to the processor and did some fixes in IsoBibItem and *bib gems. So we need to republish Relaton, IsoBibItem, IsoBib, GbBib, and IETFBib gems.
@opoudjis @andrew2net
We wish to replace the current
pstore
caches used in all Relaton data sources (ISO, IEC, IETF etc) with Relaton XML storage.The current
pstore
cache is simply not portable and hard to maintain in source code repositories. I want to maintain each Relaton entry as a separate file.Each Relaton entry should be stored in a separate file. i.e. if we found "ISO 9001:2015" then within
~/.relaton/iso/
there should be~/.relaton/iso/iso-9001-2015.xml
(whether theiso
"type" is a subdirectory is subject to consideration)Metadata about each entry (such as when was the last time it was fetched) still needs to be stored somewhere else, which could be
~/.relaton/iso/index.xml
.Moving entries between the the Global cache and Local cache is easy, just a file replacement. If we disable to Global cache, all entries will be stored in Local cache. If Local cache is disabled, we store all in the Global cache. If both caches are active, both caches should be updated to the latest entries.
Please help plan this out. Thanks.