Closed strogonoff closed 1 year ago
There's not even a name inside this block:
name:
given:
formatted_initials:
language:
- en
surname:
content: None
language:
- en
completename:
content: None
language:
- en
The root cause is this:
In the original RFC XML, there is an "author" that is an organization, not a person: https://www.ietf.org/archive/id/draft-andersen-arc-01.xml
<front>
<title abbrev="ARC">Authenticated Received Chain (ARC)</title>
<author initials="." surname="OAR-DEV Group">
<organization>OAR-DEV Group</organization>
<address>
<email>arc-discuss@dmarc.org</email>
</address>
</author>
<author initials="K." surname="Andersen" fullname="Kurt Andersen">
<organization>LinkedIn</organization>
<address>
<postal>
<street>2029 Stierlin Ct.</street>
<city>Mountain View</city>
<region>California</region>
<code>94043</code>
<country>USA</country>
</postal>
<email>kurta@linkedin.com</email>
</address>
</author>
There is no way for us to tell this is not a person, because it has initials
and surname
.
Can I propose the correct outcome to be the following?
name:
given:
formatted_initials:
content: .
language:
- en
surname:
content: OAR-DEV Group
language:
- en
Adjustments welcome.
There's not even a name inside this block:
name: given: formatted_initials: language: - en surname: content: None language: - en completename: content: None language: - en
@ronaldtse I did raise the issue only about the schema deliberately. Schema mismatch may break data loaders/deserializers (also raises the question as to how did this pass through serialization mechanism without failures, apparently some code doesn’t validate that formatted_initials
is a valid formatted string?)
Unlike the schema, the issue with the data is separate, it doesn’t break anything and can be fixed at any time…
But if we are talking about data, why do you include given name and formatted_initials
in your output since it seems like you only put there a full stop as a placeholder? I think it can be omitted and we could just have surname
and completename
left.
given:
formatted_initials:
content: .
language:
- en
why do you include given name and formatted_initials in your output since it seems like you only put there a full stop as a placeholder? I think it can be omitted and we could just have surname and completename left.
@strogonoff could you re-read my original message? There is some misunderstanding here. This question makes no sense. The fullstop is from original data.
@ronaldtse we use the command rsync -avcizxL rsync.ietf.org::bibxml-ids ./bibxml-ids
to get source files. The content of the reference.I-D.draft-andersen-arc-01.xml
source file is:
<?xml version="1.0" encoding="UTF-8"?>
<reference anchor="I-D.andersen-arc">
<front>
<title>Authenticated Received Chain (ARC)</title>
<author initials="" surname="None" fullname="None">
</author>
<author initials="K." surname="Andersen" fullname="Kurt Andersen">
</author>
<author initials="J." surname="Rae-Grant" fullname="John Rae-Grant">
</author>
<author initials="B." surname="Long" fullname="Brandon Long">
</author>
<author initials="J. T." surname="Adams" fullname="J. Trent Adams">
</author>
<author initials="S. M." surname="Jones" fullname="Steven M Jones">
</author>
<date month="February" day="1" year="2016" />
<abstract>
<t> Authenticated Received Chain (ARC) permits an organization which is
creating or handling email to indicate their involvement with the
handling process by adding a cryptographically signed header (or
headers) in a manner analagous to that of DomainKeys Identified Mail
(DKIM). Assertion of responsibility is validated through a
cryptographic signature and by querying the Signer's domain directly
to retrieve the appropriate public key. Changes in the message which
may break DKIM, may be tracked through the ARC set of headers.
</t>
</abstract>
</front>
<seriesInfo name="Internet-Draft" value="draft-andersen-arc-01" />
<format type="TXT" target="https://www.ietf.org/archive/id/draft-andersen-arc-01.txt" />
</reference>
So far noticed only in draft-andersen-arc-01.yaml.