Closed ronaldtse closed 2 years ago
@ronaldtse do you have any references to standards or IEEE PubID parsing implementations that could help?
@ronaldtse what we want to do with parsed PubIDs? Do we need to convert it back to PubID, other formats?
Code that is now used for PubID parsing is here: https://github.com/relaton/relaton-ieee/blob/main/lib/relaton_ieee/rawbib_id_parser.rb
The source files for these entries are at https://github.com/relaton/ieee-rawbib.
There are few problems:
normtitle
entries even though they have different filenames. We need to find out how to distinguish these bibliographic entries and then report back to IEEE.<standard_id>
value is 0.Regarding pubid, notice that there are multiple types of IEEE PubIDs, and also some jointly-published ones with ISO PubIDs. Since we now have an ISO PubID implementation, it will help us here.
@ronaldtse could you tell me what problem we are trying to solve here? Do we want to convert to another format or we want to distinguish these bibliographic entries from "ieee-rawbib" / build relations graph or something else?
Right now, Relaton-IEEE is unable to parse all IEEE PubID entries due to parsing through using regular expressions. It has the following consequences:
i.e. we must properly parse IEEE PubIDs in order to make the full IEEE dataset available for citation.
Right now, Relaton-IEEE is unable to parse all IEEE PubID entries due to parsing through using regular expressions. It has the following consequences:
- We are unable to convert all ieee-rawbib data into https://github.com/ietf-ribose/relaton-data-ieee . Around 10% of entries are now missing, therefore people cannot cite from the full library. (see this: Mapping for IEEE references in
bibxml6
to IEEE dataset ietf-ribose/bibxml-service#136 (comment) and Missing bibliographic items for these identifiers (from ieee-rawbib) relaton/relaton-ieee#16)- Some entries in Relaton-IEEE are parsed wrongly. This means that people end up citing the wrong document. See this for example: Data mismatch when retrieving IEEE standards by xml2rfc paths ietf-ribose/bibxml-service#31 (comment)
i.e. we must properly parse IEEE PubIDs in order to make the full IEEE dataset available for citation.
Will we use it (pubid-ieee) to replace https://github.com/relaton/relaton-ieee/blob/main/lib/relaton_ieee/rawbib_id_parser.rb ?
Will we use it (pubid-ieee) to replace https://github.com/relaton/relaton-ieee/blob/main/lib/relaton_ieee/rawbib_id_parser.rb ?
Yes.
@ronaldtse should we use pubid-iso to parse identifiers like:
IEC/IEEE 62704-1:2017
IEC/IEEE 62704-2:2017
IEC/IEEE 62704-3:2017
IEC/IEEE 62704-4:2020
IEC/IEEE 63113:2021
IEC/IEEE 63260:2020
IEC/ISO/IEEE 80005-1:2012
ISO/IEC FDIS P15289, April 2014(E)
?
The ones that start with ISO, yes. But the rest are IEC identifiers, IEC PubIDs are similar to ISO’s but they have different stages, and allow a sub part (eg IEC 1000-1-2). We need to have a pubid-iec.
IEEE Std 1073.1.1.1-2004 (https://standards.ieee.org/ieee/1073.1.1.1/1571/)
"Replaced by ISO/IEEE 11073-10101-2004"
Example of similar identifier: P11073-10101c (https://standards.ieee.org/ieee/11073-10101c/10476/) Title: "Standard for Health informatics--Point-of-care medical device communication - Part 10101: Nomenclature Amendment 3: Additional definitions".
@ronaldtse I believe IEEE Std 1073.1.1.1-2004 should be "IEEE 1073-10101-2004" or "IEEE 11073-10101-2004", what do you think?
I believe IEEE Std 1073.1.1.1-2004 should be "IEEE 1073-10101-2004" or "IEEE 11073-10101-2004", what do you think?
No, we have to keep the original identifier. Its replacement "ISO/IEEE 11073-10101-2004" probably intentionally selected the 10101 part to keep identity with 1.1.1. Notice that 1073 became 11073 because ISO 1073 is already taken by another standard. This is causality in reverse.
"P11073-10101c" means it is the "provisional" (i.e. draft) version of "11073-10101c". The "c" character means it is the 3rd Amendment to "11073-10101". According to the website, "P11073-10101c" is done in 2020 so it is a "draft amendment".
i.e. historically:
@ronaldtse "IEEE 802.15.22.3-2020" - how can I know what is 22 and 3 here?
I believe IEEE Std 1073.1.1.1-2004 should be "IEEE 1073-10101-2004" or "IEEE 11073-10101-2004", what do you think?
No, we have to keep the original identifier. Its replacement "ISO/IEEE 11073-10101-2004" probably intentionally selected the 10101 part to keep identity with 1.1.1. Notice that 1073 became 11073 because ISO 1073 is already taken by another standard. This is causality in reverse.
I'm trying to find solution how I should treat these numbers. I had an idea to parse it as {number}.{part}.{subpart} but there are over 3 numbers. Maybe I can parse extra numbers as extra subparts.
"IEEE 802.15.22.3-2020" "IEEE Standard for Spectrum Characterization and Occupancy Sensing":
You can see that "22.3" is called the "Part" in the draft.
I had an idea to parse it as {number}.{part}.{subpart} but there are over 3 numbers.
I am not sure on whether there is a proper structure in IEEE identifiers. Some patterns are somewhat arbitrary (e.g. there exists 802.15.22.3 but not 802.15.22.1 and 802.15.22.2.)
This is a topic we will need to investigate and analyse.
@ronaldtse I believe we finished with this issue
@mico we have 886 identifiers that are not yet being parsed, but I will make that into a new issue.
Full PubIDs from IEEE: pubid-sorted.txt.zip
Please also look at #2 and #3 for resolved details.
Method of generating this list:
Some observed rules (https://github.com/metanorma/pubid-ieee/issues/2#issuecomment-951128062):
/D{N}
or/D.{N}
or_D{N}
means draft NJoint publications:
P844.3/C22.2 293.3/D0, Aug 2018
is an "IEEE P844.3" joint standard with "CSA C22.2 No. 293.3", Draft 0.309/N42.3-1999
is "IEEE 309" joint with "ANSI N42.3"529-1980/Cor 1-2017
means it is a correction of "IEEE 592-1980" issued in 2017, the first corrigenda for the standardP1062/D.19, March 2015
means it is Draft 19 of "IEEE P1062"D
:C37.60/62271-111-2018
is joint IEC 62271-111 and IEEE Std C37.60-2018IEC/IEEE P60079-30-2/D4A, Jul 2013