Open ronaldtse opened 2 years ago
@ronaldtse there are documents with "r" in identifier, e.g. "IEEE P11073-10101/D3r7, September 2018" Is it a document revision or draft revision?
@mico I believe D3r7 means "3rd draft, revision 7". The concept of "revision" in the IEEE PubID does not seem to be in common use, but I guess there are 11 instances for 2 documents in the entire library!
IEEE P11073-10101/D3r7, September 2018
IEEE P11073-10101/D4r1, January 2019
IEEE P11073-10101/D5r4, February 2019
IEEE P11073-10101/D7r1, March 2019
IEEE P11073-10101/D9r1, April 2019
IEEE P11073-10471/D2r2, April 2020
IEEE P11073-10471/D3r2, November 2021
IEEE P11073-110101/D8r1, April 2019
IEEE P1242/D8r2, June 2016
IEEE P1242/D8r3, July 2016
I think these two patterns are identical in intention:
D{v}r{r}
D{v}.{r}
There are a lot more of the "dot notation" -- 1439 of them. Let's treat them as the same, and use the "dot notation" as the output format.
@ronaldtse "IEEE P1609.2.1/D12D14" What is D14 here? Is it another draft? Tried to find out by myself but didn't find this document with "D12D14".
Looking at the examples:
IEEE P1609.2.1/D10, February 2020
IEEE P1609.2.1/D12, June 2020
IEEE P1609.2.1/D12D14, June 2020
IEEE P1609.2.1/D15, August 2020
IEEE P1609.2.1/D4, November 2021
IEEE P1609.2.1/D6, January 2022
I think this is a typo for D14. Let's make this a single time replacement (we should have a set of special cases to replace these errors) so it is not in the parse rules.
I think this is a typo for D14. Let's make this a single time replacement (we should have a set of special cases to replace these errors) so it is not in the parse rules.
There are 7 cases like this:
IEEE P11073-10420/D4D5, March 2020
IEEE P1609.2.1/D12D14, June 2020
IEEE P1653.5/D7d1 November, 2019
IEEE P3002.2/D6D7, April 2017
IEEE P515/D4D5, March 2017
IEEE PC57.143 /D24D25, October 2012
IEEE Unapproved Draft Std P1680/D4D6, Aug 2009
You're right. This is clearly intentional.
When I see these:
P3002.2/D6, Oct 2015
P3002.2/D6D7, Apr 2017
P3002.2/D7, Sept 2017
I think this means it is a "pre-D7 coming from D6".
Given that there are:
P1680/D4D6, Aug 2009
P1609.2.1/D12D14
P352/D4D6, Feb 2016
The pre-draft and intended target draft numbers are not consecutive.
So we have to have parse these two numbers separately.
@ronaldtse should I ignore "REV" here or just leave it as part of number?
IEEE P802-REV/D1.7
IEEE P802-REV/D1.9
@mico leave REV as part of the number.
Look at these two entries, the "REV" is part of the number (the first one in a superseding relationship, the second in a title):
"P802.11-REVma/D5.0 (Superseded by P802.11-REVma_D6.0)"
"P802.11ai/D11.0 Sept 2016 - IEEE Approved Draft Standard for Information technology-Telecommunications and information exchange between systems-Local and metropolitan area networks-Specific requirements Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications: Amendment to IEEE P802.11-REVmc(TM)/D8.0: Fast Initial Link Setup"
@ronaldtse What to do with the documents like:
IEEE Unapproved Draft Std 11073-10471/D02, Feb 2008
IEEE Unapproved Draft Std 11073-10472/D02, Apr 2009
IEEE Active Unapproved Draft Std PC37.59/D11, Jul 2007
IEEE Active Unapproved Draft Std PC57.129/D10, Jul 2007
IEEE Active Unapproved Draft Std PC62.21/D2, Jul 2007
IEEE Approved Draft Std C57.12.35/D7, 07
IEEE Approved Draft Std P1076.1/D3.3, Feb 6, 2007
IEEE Approved Draft Std P11073-10415/D11, Aug 2008
Do we need to keep "Approved Draft", "Active Unapproved Draft", "Unapproved Draft" in resulting PubID?
@mico I believe we should keep these statuses as part of the PubID.
@ronaldtse there are many identifiers with word "Unapproved" but without "Draft" after it. Should we add "Draft" to the output? Should we add "Draft" to every draft document? e.g.
IEEE Unapproved Std P277/D2,Mar 2007
IEEE Unapproved Std P487/D7 Feb 2007
IEEE Unapproved Std P495/D12 Mar 2007
IEEE Unapproved Std P802.16g/D8, Feb2007
IEEE Unapproved Std P802.1ag/D8, Feb 2007
I did a search for "IEEE Unapproved Std P277/D2,Mar 2007" and got this: https://ieeexplore.ieee.org/document/4152680
The full title is apparently: "P277/D2,Mar 2007 - Unapproved Draft IEEE Recommended Practice for Cement Plant Power Distribution"
In fact, the original raw XML data for this entry is this:
<publication>
<title><![CDATA[IEEE Unapproved Std P277/D2,Mar 2007]]></title>
<normtitle><![CDATA[IEEE Unapproved Std P277/D2,Mar 2007]]></normtitle>
<standardsfamilytitle>IEEE Recommended Practice for Cement Plant Power Distribution</standardsfamilytitle>
<publicationinfo>
<idamsid>0b000064807b4661</idamsid>
<stdnumber>P277/D2,Mar 2007</stdnumber>
<publicationtype>Standard</publicationtype>
<publicationsubtype>Standard Docs</publicationsubtype>
<standard_subtype>IEEE Standard</standard_subtype>
<ieeeabbrev>IEEESTD</ieeeabbrev>
<pubstatus>Active</pubstatus>
<publicationopenaccess>F</publicationopenaccess>
<standard_id>0</standard_id>
<standard_status>Inactive</standard_status>
<standardmodifierset>
<standard_modifier>Draft</standard_modifier>
</standardmodifierset>
<packagememberset>
<packagemember>STDSELECT</packagemember>
</packagememberset>
<standard_family>277</standard_family>
<standardpackageset>
<standard_package>3000 Standards Collection for Industrial and Commercial Power Systems</standard_package>
</standardpackageset>
<icscodes>
<code_term codenum="91.100.10">Cement. Gypsum. Lime. Mortar</code_term>
</icscodes>
<pubtopicalbrowseset>
<pubtopicalbrowse>Power, Energy and Industry Applications</pubtopicalbrowse>
</pubtopicalbrowseset>
<copyrightgroup>
<copyright>
<year>2007</year>
<holder>IEEE</holder>
</copyright>
</copyrightgroup>
<publisher>
<publishername>IEEE</publishername>
<address>
<country>USA</country>
</address>
</publisher>
<holdstatus>Hold</holdstatus>
<confgroup>
<doi_permission>F</doi_permission>
</confgroup>
<amsid>4152678</amsid>
</publicationinfo>
<article>
<title><![CDATA[Unapproved Draft IEEE Recommended Practice for Cement Plant Power Distribution]]></title>
<articleinfo>
<articleseqnum>1</articleseqnum>
<idamsid>0b000064807b4665</idamsid>
<articlestatus>Active</articlestatus>
<articleopenaccess>F</articleopenaccess>
<articleshowflag>F</articleshowflag>
<articleplagiarizedflag>F</articleplagiarizedflag>
<articlenodoiflag>F</articlenodoiflag>
<articlecoverimageflag>F</articlecoverimageflag>
<articlereferenceflag>F</articlereferenceflag>
<articlepeerreviewflag>F</articlepeerreviewflag>
<holdstatus>Publish</holdstatus>
<articlecopyright holderisieee="Yes" year="0"/>
<date datetype="OriginalPub">
<year>2007</year>
</date>
<size>330403</size>
<filename docpartition="5" filetype="MainPDF">04152680.pdf</filename>
<artpagenums endpage="" startpage=""/>
<amsid>4152680</amsid>
</articleinfo>
</article>
</volume>
If you look at the <stdnumber>
, the "Unapproved..." text is not present.
But look at the discrepancy between the title values of <publication>
vs the <articleinfo>
:
<publication>
<normtitle><![CDATA[IEEE Unapproved Std P277/D2,Mar 2007]]></normtitle>
</publication>
<!--vs-->
<volume>
<article>
<title><![CDATA[Unapproved Draft IEEE Recommended Practice for Cement Plant Power Distribution]]></title>
</article>
</volume>
Interestingly, the publication
says "Unapproved Std" but the article
says "Unapproved Draft".
The "Unapproved" part is not documented in the XML at all.
I found the following two files that are:
04152686.xml
"IEEE Unapproved Std P1076.1/D3.3, Feb2007"04278973.xml
"IEEE Approved Draft Std P1076.1/D3.3, Feb 6, 2007"Notice this diff:
< <title><![CDATA[IEEE Unapproved Std P1076.1/D3.3, Feb2007]]></title>
< <normtitle><![CDATA[IEEE Unapproved Std P1076.1/D3.3, Feb2007]]></normtitle>
---
> <title><![CDATA[IEEE Approved Draft Std P1076.1/D3.3, Feb 6, 2007]]></title>
> <normtitle><![CDATA[IEEE Approved Draft Std P1076.1/D3.3, Feb 6, 2007]]></normtitle>
8,9c8,9
< <idamsid>0b000064807b466f</idamsid>
< <stdnumber>P1076.1/D3.3, Feb2007</stdnumber>
---
> <idamsid>0b000064808ffb04</idamsid>
> <stdnumber>P1076.1/D3.3, Feb 6, 2007</stdnumber>
24c24
< <isbn isbntype="New-2005" mediatype="Electronic">978-1-5044-2834-7</isbn>
---
> <isbn isbntype="New-2005" mediatype="Electronic">978-1-5044-2833-0</isbn>
52c52
< <amsid>4152684</amsid>
---
> <amsid>4278971</amsid>
57c57
< <idamsid>0b000064820da92c</idamsid>
---
> <idamsid>0b000064820daaee</idamsid>
59c59
< <amsid>4152685</amsid>
---
> <amsid>4278972</amsid>
64c64
< <title><![CDATA[Unapproved IEEE Draft Standard VHDL Analog and Mixed-Signal Extensions (Revision of IEEE Std 1076.1-1999)]]></title>
---
> <title><![CDATA[Approved IEEE Draft Standard VHDL Analog and Mixed-Signal Extensions (Revision of IEEE Std 1076.1-1999)]]></title>
67c67
< <idamsid>0b000064807b4673</idamsid>
---
> <idamsid>0b000064808ffb08</idamsid>
81,82c81,82
< <size>6226635</size>
< <filename docpartition="5" filetype="MainPDF">04152686.pdf</filename>
---
> <size>6215274</size>
> <filename docpartition="5" filetype="MainPDF">04278973.pdf</filename>
84c84
< <amsid>4152686</amsid>
---
> <amsid>4278973</amsid>
There is no difference in any of the statuses.
This tells me that the status of "Approved vs Unapproved" is not encoded in the XML data, and is only available in the PubID. Maybe we should store the parsed status of "Approved" and "Unapproved" and only display it out in the "full PubID style".
This tells me that the status of "Approved vs Unapproved" is not encoded in the XML data, and is only available in the PubID. Maybe we should store the parsed status of "Approved" and "Unapproved" and only display it out in the "full PubID style".
What about word "Draft"? Should we display it only for "full PubID style" as well? And for every draft document? (documents with /D suffix)
@mico I think you are right:
Should we display it only for "full PubID style" as well? I think so.
And for every draft document? (documents with /D suffix) Yes. For PubIDs that do not have these statements:
"Unapproved Draft"
"Unapproved Draft Std"
"Active Unapproved Draft Std"
"Approved Draft"
"Approved Draft Std"
We only know if it is a "Draft", but we do not know "Approved vs Unapproved" and whether it is "Active".
Found a new pattern to parse: "/D{\d+}+{\d+}"
IEEE 1647/D8+3, December 2010
IEEE P1031/D1+1, August 2010
IEEE P463/D1+1, May 2013
IEEE P751/D2+1, May 2018
P1857/D1+1, July 2012
P2745.1/D4+1, April 2019
@mico I checked online but know what these mean...
There is one instance of CEI/IEC 61000-4-15:1997+A1:2003
, which means it is CEI/IEC 61000-4-15:1997
with Amendment 1 (A1:2003
), the +
here means "combined with Amendment 1". This instance is IEC practice.
@ronaldtse another identifier I don't know how to parse: "PC37.30.2/D043 Rev 18, May 2015" Is it a revision of Draft? Any ideas how I should represent it?
@ronaldtse "IEEE P1680.4_D1 and NSF/ANSI 426, August 2016" should I represent it as "IEEE P1680.4-2016/D1 (NSF/ANSI 426)"? Better ideas?
Upd.: it should be "IEEE P1680.4/D1 (NSF/ANSI 426), August 2016" or "IEEE P1680.4/D1, August 2016 (NSF/ANSI 426)"
Probably "IEEE P1680.4/D1, August 2016 (NSF/ANSI 426)"?
NSF is the "National Sanitary Foundation" which issues food safety and hygiene standard. They use PubIDs like "NSF/ANSI/CAN 61", "NSF/ANSI 61-2021", "NSF/ANSI 336-2011".
I think PC37.30.2/D043 Rev 18, May 2015
is a revision of a draft, yes. Maybe PC37.30.2/D43R18, May 2015
. This is similar to the other patterns like:
IEEE P11073-10101/D3r7, September 2018
IEEE P11073-10101/D4r1, January 2019
IEEE P11073-10101/D5r4, February 2019
IEEE P11073-10101/D7r1, March 2019
IEEE P11073-10101/D9r1, April 2019
@ronaldtse IEEE Unapproved Std PC37.101/D13, Jun 2006
- as we can see, this is draft, so does it have missing "Draft" in the identifier? Should we add "Draft" to the output, so the result will be IEEE Unapproved Draft Std PC37.101/D13, Jun 2006
?
Update: I think something wrong with source data: https://github.com/metanorma/pubid-ieee/blob/main/spec/fixtures/pubid-parsed.txt#L5705
@mico we just received clarification from IEEE:
If it's an "unapproved draft" it is not a standard yet, so none of them should have "Std" included. I'm sure many do though -- the working groups work on revisions based on the published version and are not necessarily aware of these subtleties.
i.e. "Unapproved Std" or "Unapproved Draft Std" should not have "Std".
@mico I checked online but know what these mean...
There is one instance of
CEI/IEC 61000-4-15:1997+A1:2003
, which means it isCEI/IEC 61000-4-15:1997
with Amendment 1 (A1:2003
), the+
here means "combined with Amendment 1". This instance is IEC practice.
@ronaldtse are you saying here that IEEE 1647/D8+3
is a Draft 8 + Amendment 3?
@ronaldtse I'm struggling with identifier "ISO/IEC/IEEE P26513_D2, January 2017". This seems to be ISO identifier, but IEEE format and have IEEE's draft part.
There is no way to reformat it to "ISO" format without losing "draft" part. So, seems the output should be "ISO/IEC/IEEE Draft Std P26513/D2, January 2017". Am I right?
Upd.: I found another challenging identifier: "P82079-1_D4_FDIS" – definitely "ISO" identifier, because of the stage, but with IEEE's draft. Should we also have ISO/IEEE mixed format (where we render identifier in ISO format, but allow draft to be rendered in IEEE format)?
IEEE draft documents can have the following patterns: