Tag Complexity - Githubissues

may- commented 10 years ago

Hello,

I have a question about the tag complexity. NLM tag library defines tags with maximal flexibility. Is there any limitation in the meTypeset output? In other words, should all possible occurrences of tags be taken into account?

For example, There are (at least) 5 possibilities for the publication-date tag:

<pub-date> tag never occurs in the whole xml.

with a content text only

--- xml ---
<pub-date>2014</pub-date>
--- json ---
{ pub-date : '2014' }

with attributes

--- xml ---
<pub-date pub-type='epub'>2014</pub-date>
--- json ---
{ pub-date : { @pub-type : 'epub', #text : '2014'} }

tag occurs more than once ``` --- xml --- 2014 2013 --- json --- { pub-date : [ { @pub-type : 'epub', #text : '2014' }, { @pub-type : 'pdf', #text : '2013' } ] } ```

with children tags

--- xml ---
<pub-date>
   <year>2014</year>
   <month>Oct.</month>
</pub-date>
--- json ---
{ pub-date : 
   { year : '2014', month : 'Oct.' }
}

If I call json['pub-date'], it returns (1) undefined(key error), (2) string, (3) object(attribute / text), (4) array, or (5) object(children nodes). Should I check all possibilities for every tag and implement for all cases? Is there an elegant technique that I can handle all cases in the same way??

I thought, If there were any limitations in meTypeset, (for example, meTypeset uses always <mixed-citation> tag, never <element-citation> for a reference item) then I could skip to implement those tags...

MartinPaulEve commented 10 years ago

Hi Mayu,

Thanks for this. The tag output that you've given there is article-level metadata, correct? If so, then this is not defined by meTypeset, which will simply merge in whatever front-matter it is provided with.

References are, indeed, constrained to the mixed-citation format, unless the user specifies --zotero as a flag, in which case meTypeset will try to build element-citation tags using the specified zotero database.

Best wishes,

Martin

On 21/10/14 13:46, may ohta wrote:

Hello,

I have a question about the tag complexity. NLM tag library http://dtd.nlm.nih.gov/book/tag-library/ defines tags with maximal flexibility. Is there any limitation in the meTypeset output? In other words, should all possible occurrences of tags be taken into account?

For example, There are (at least) 5 possibilities for the publication-date http://dtd.nlm.nih.gov/book/tag-library/n-mm70.html tag:

1.
|<pub-date>| tag never occurs in the whole xml.
2.
with a content text only

|--- xml ---
<pub-date>2014</pub-date>
--- json ---
{ pub-date : '2014' }
|i
3.
with attributes

|--- xml ---
<pub-date pub-type='epub'>2014</pub-date>
--- json ---
{ pub-date : { @pub-type : 'epub', #text : '2014'} }
|
4.
tag occurs more than once

|--- xml ---
<pub-date pub-type='epub'>2014</pub-date>
<pub-date pub-type='pdf'>2013</pub-date>
--- json ---
{ pub-date : [ 
    { @pub-type : 'epub', #text : '2014' },
    { @pub-type : 'pdf', #text : '2013' }
] }
|
5.
with children tags

|--- xml ---
<pub-date>
    <year>2014</year>
    <month>Oct.</month>
</pub-date>
--- json ---
{ pub-date : 
    { year : '2014', month : 'Oct.' }
}
|
If I call |json['pub-date']|, it returns (1) undefined(key error), (2) string, (3) object(attribute / text), (4) array, or (5) object(children nodes). Should I check all possibilities for every tag and implement for all cases? Is there an elegant technique that I can handle all cases in the same way??

I thought, If there were any limitations in meTypeset, (for example, meTypeset uses always || tag, never || for a reference item) then I could skip to implement those tags...

— Reply to this email directly or view it on GitHub https://github.com/withanage/HEIDIEditor/issues/41.

Dr. Martin Paul Eve Lecturer in English Literature University of Lincoln

E: meve@lincoln.ac.uk W: https://www.martineve.com

Founder, Open Library of the Humanities (https://www.openlibhums.org) Chief Editor, Orbit: Writing Around Pynchon (https://www.pynchon.net) Web Editor, Alluvium, (http://www.alluvium-journal.org)

andreahacker commented 10 years ago

Hi everyone - talked to Mayumi about this earlier today and I think it would make sense to create a selection of the book tag set for our immanent purposes, at least for now. I do not think we need the entire collection of tags at this point and should create a 1.0 common-sense version that we can operate with. I suggest the following: We create a selection, upload here for discussion and then proceed. Would that make sense? Best Andrea

may- commented 10 years ago

@Martin, Thank you for the prompt response!

I assumed the book-level metadata (I use the book tag set instead of journal article tag set), but it was just an example. I wanted to know not only the metypeset limitations but also practical guidelines in general. I just thought, if there is a tag which meTypeset never generates (regardless of where the tag is defined), I could ignore the tag. If I input the metadata with pub-date tag to meTypeset, meTypeset just returns xml with pub-date tag, right? So all possible occurrences of pub-date tag should be taken into account, shouldn't it?

As Andrea suggested, it would be nice for me, if I have the selected tags and specific usage of tags. For example, we always provide the pub-date tag and use it with the attribute only, or something like that.

may- commented 10 years ago

Perhaps I should ask differently:

Up to now, my implementation works only if the xml has the specific structure. for example:

<book>
    <book-meta>
        <pub-date pub-type='******'>
            <year>******</year>
            <month>*******</month>
            <day>*******</day>
        </pub-date>
        ...

It means, my implementation looks like:

....
root['book']['book-meta']['pub-date']['year'] = ...
....

My question is: Do I have to implement all possible cases for all tags? Such as:

....
if root['book']['book-meta'] has key 'pub-date':
    if root['book']['book-meta']['pub-date'] is string:
        <input ... /> ...
    else if root['book']['book-meta']['pub-date'] is array:
....

I started to check the definition of tags, so that I can cover all possible occurrences of tags, but it is soooo exhaustive. That's why I asked this.

Is there a more elegant way to handle all possible cases? Or am I going in the wrong direction??

MartinPaulEve commented 10 years ago

Hi Mayu,

meTypeset handles metadata by literally copying the structure of the file that you give into the head of the document, so it's totally up to you to constrain that portion as you want...

Best wishes,

Martin

On 21/10/14 17:25, may ohta wrote:

Perhaps I should ask differently:

Up to now, my implementation works only if the xml has the specific structure. for example:

|
*****_ **_***_ **_**** ... | It means, my implementation looks like: |.... root['book']['book-meta']['pub-date']['year'] = ... .... | My question is: Do I have to implement all possible cases for all tags? Such as: |.... if root['book']['book-meta'] has key 'pub-date': if root['book']['book-meta']['pub-date'] is string: ... else if root['book']['book-meta']['pub-date'] is array: .... | I started to check the definition of tags, so that I can cover all possible occurrences of tags, but it is soooo exhaustive. That's why I asked this. Is there a more elegant way to handle all possible cases? Or am I going in the wrong direction?? — Reply to this email directly or view it on GitHub https://github.com/withanage/HEIDIEditor/issues/41#issuecomment-59955720.

Dr. Martin Paul Eve Lecturer in English Literature University of Lincoln

E: meve@lincoln.ac.uk W: https://www.martineve.com

Founder, Open Library of the Humanities (https://www.openlibhums.org) Chief Editor, Orbit: Writing Around Pynchon (https://www.pynchon.net) Web Editor, Alluvium, (http://www.alluvium-journal.org)

may- commented 10 years ago

Hi @andreahacker , So, it is now rather an editorial question than technical, I guess. May I decide which tags we apply?? Do you have a guideline for metadata and book structure in terms of practical operation of end products?

andreahacker commented 10 years ago

M - yes it is an editorial decision. We will create a list together. A.

andreahacker commented 10 years ago

Ok everyone, food for thought:

Mayumi and I are sitting here deciding on which book tags to incorporate. As we are doing this I was struck by the following issue: where does all the information for the meta data come from? On a book-level (i.e. title, year, publisher, etc.) it is perfectly reasonable to expect the publisher/editors to enter this information.

Our original idea was rather minimal (see step 2 of the HEIDIEditor workflow) - there was little metadatainformation that we intended to enter on the chapter level, for example. However, this may not be in our interest, since discoverability is such a crucial aspect of our undertaking.

However, It gets much trickier when we want to input metadata about, say, contributors or even abstracts (See below). Both should be relatively clearly marked to ensure discoverability but it cannot be the editor's task to input all the pertinent meta data information. In other words: information for the following tags has to be provided by the authors themselves either in OMP (we will investigate) or in the manuscript. How do we proceed?

abstract

withanage commented 10 years ago

metadata on book-level and chapter level can be provided by omp or in the book-itself. So providing the metadata in the long-run should be handled by omp or any other program which provides input to meTypeset. For discoverability, basic metadata fields : person(author, contributor etc.) , title and keywords should be enough. But when you add a person, there should be a way to define , if he is a author etc. This can be achieved by a drop down.

For the abstract, it really depends , what the editors need. If no complexity is needed, add as text. Otherwise, there can be a separate editor for the abstract.

Best, Dulip

withanage / HEIDIEditor

Tag Complexity #41