ietf-tools / relaton-data-ids

Bibliographic data information for Internet-Drafts in Relaton format
7 stars 10 forks source link

Strip away wrapping tags and surrounding empty space #4

Closed ronaldtse closed 2 years ago

ronaldtse commented 2 years ago

Under abstract > content we need to strip away the wrapping (empty) tags and the surrounding empty space:

https://github.com/ietf-ribose/relaton-data-ids/blob/d7fd11beadea199a170cdccde011d93cf4fee1e9/data/DRAFT-3GPP-COLLABORATION-01.yaml#L82-L85

==>

abstract:
  content: "This document describes the standardization collaboration between
    3GPP and IETF.  This memo provides information for the Internet community."
ronaldtse commented 2 years ago

We probably want to strip out all \n and \t from the content that do not represent new paragraphs. They are formatting concerns that have no relevance to semantics.

https://github.com/ietf-ribose/relaton-data-ids/blob/d7fd11beadea199a170cdccde011d93cf4fee1e9/data/DRAFT-ANAND-SPRING-POI-SR-01.yaml#L74-L82

andrew2net commented 2 years ago

@ronaldtse fixed but we need to keep <p>. There is <t> element in rfc7991. Nick asked me to replace <t> with <p> because in the metanorma <t> isn't allowed. So when relaton parses BibXML it replaces <t> with <p> in abstracts. When BibXML is rendered it repaces <p> with <t> back.

ronaldtse commented 2 years ago

@andrew2net I see, but is the data being updated? I don't see the data being update daily yet: https://github.com/ietf-ribose/relaton-data-ids/blob/main/data/DRAFT-3K1N-6TISCH-ALICE0-00.yaml

I want to verify that the whitespaces are stripped.

Screenshot 2021-12-23 at 10 53 22 AM

Regarding <p>. I see that RFC 7991 does support some rich-text:

2.1.  <abstract>

   Contains the Abstract of the document.  See [RFC7322] for more
   information on restrictions for the Abstract.

   This element appears as a child element of <front> (Section 2.26).

   Content model:

   In any order, but at least one of:

   o  <dl> elements (Section 2.20)

   o  <ol> elements (Section 2.34)

   o  <t> elements (Section 2.53)

   o  <ul> elements (Section 2.63)

And that RFC 7322 specifies that:

4.3.  Abstract Section

   Every RFC must have an Abstract that provides a concise and
   comprehensive overview of the purpose and contents of the entire
   document, to give a technically knowledgeable reader a general
   overview of the function of the document.

   Composing a useful Abstract generally requires thought and care.
   Usually, an Abstract should begin with a phrase like "This memo ..."
   or "This document ..."  A satisfactory Abstract can often be
   constructed in part from material within the Introduction section,
   but an effective Abstract may be shorter, less detailed, and perhaps
   broader in scope than the Introduction.  Simply copying and pasting
   the first few paragraphs of the Introduction is allowed, but it may
   result in an Abstract that is both incomplete and redundant.  Note
   also that an Abstract is not a substitute for an Introduction; the
   RFC should be self-contained as if there were no Abstract.

   Similarly, the Abstract should be complete in itself.  It will appear
   in isolation in publication announcements and in the online index of
   RFCs.  Therefore, the Abstract must not contain citations.

So multiple paragraphs are allowed and it is fine to use <p> for that.

but we need to keep <p>. There is <t> element in rfc7991. Nick asked me to replace <t> with <p> because in the metanorma <t> isn't allowed. So when relaton parses BibXML it replaces <t> with <p> in abstracts. When BibXML is rendered it repaces <p> with <t> back.

How Metanorma deals with text is technically of no concern to Relaton. Metanorma is only a consumer of Relaton data here.

What I am trying to get at is that the "abstract" should be in a text format that is interoperable. No one uses RFC 7991 text formatting outside IETF, and therefore we shouldn't either.

However since we do not yet have a clear spec of what rich-text format is to be used in the Relaton abstract, I'm fine to leave this as is and define that in a separate Relaton issue.