metanorma / pubid-ieee

PubID spec and implementation for IEEE deliverables
BSD 2-Clause "Simplified" License
1 stars 0 forks source link

Parse co-published PubIDs #25

Open ronaldtse opened 2 years ago

ronaldtse commented 2 years ago

There are many PubIDs that are co-published. This means that the different organizations share the same "standard number". This also means that the "standard number syntax" will either comply with the IEEE syntax, or the syntax of the other publisher.

ASTM format:

IEEE/ ASTM SI 10-2010 (Revision of IEEE/ASTM SI 10-2002)

ANSI format:

IEEE/ANSI Std 24-1984 (Revision of 24-1977)
IEEE/ANSI Std 484-1987

ISO, IEC or ISO/IEC format but with the IEEE P applied:

IEEE/IEC P60076-57-1202, July 2014
IEEE/IEC P60076-57-129_FDIS, July2017
IEEE/IEC P60079-30-1 D5, January 2022
IEEE/IEC P60079-30-2 D4, January 2022
IEEE/IEC/ISO P80005-1/FDIS_March 2012
IEEE/ISO/IEC 8802-1Q-2020/Amd31-2021
IEEE/ISO/IEC 8802-3:2021/Amd 4-2021
IEEE/ISO/IEC 8802-3:2021/Amd7-2021
IEEE/ISO/IEC 8802-3:2021/Amd8-2021

Purely ISO, IEC or ISO/IEC format:

IEC/IEEE 60076-16 Edition 2.0 2018-09
IEC/IEEE 60076-57-1202:2017 Edition 1.0 2017-05
IEC/IEEE 60076-57-129:2017
ISO/IEC/IEEE 24748-5:2017(E)
ISO/IEC/IEEE 24765:2010(E)
ISO/IEC/IEEE 24765:2017(E)
ISO/IEC/IEEE 24774:2021(E)
IEEE/ISO/IEC 8802-3:2021/Amd 4-2021
IEEE/ISO/IEC 8802-3:2021/Amd7-2021
IEEE/ISO/IEC 8802-3:2021/Amd8-2021
mico commented 2 years ago

@ronaldtse we agreed before to use "pubid-iso" to parse ISO identifiers. Should it be on the "pubid-ieee" level or "relaton" level? I believe "relaton" should decide what parser to use... Also there are difference with what "pubid-iso" can parse now and what we have in our PubIDs list. Most of "ISO" identifiers there "pubid-iso" will not be able to parse without fixes, for example:

ISO/IEC 10861 : 1994 [ANSI/IEEE Std 1296, 1994 Edition]
ISO/IEC 13213 : 1994 [ANSI/IEEE Std 1212, 1994 Edition]
ISO/IEC 14515-1:2000 IEEE Std 2003.1-2000
ronaldtse commented 2 years ago

we agreed before to use "pubid-iso" to parse ISO identifiers. Should it be on the "pubid-ieee" level or "relaton" level? I believe "relaton" should decide what parser to use...

I believe pubid-ieee should implement the link to re-use pubid-iso. Relaton technically does not do anything with the PubID except that it needs a PubID to identify a bibliographic item.

In the examples:

We could update pubid-iso to handle these, or make pubid-ieee handle them?

mico commented 2 years ago

We could update pubid-iso to handle these, or make pubid-ieee handle them?

To parse them in pubid-ieee I need to duplicate parsing code from pubid-iso. Better to update pubid-iso for these identifiers.

Also I have an idea, maybe we should introduce something like this https://github.com/relaton/relaton/blob/main/spec/relaton/registry_spec.rb for PubIDs? e.g.:

expect(PubID::Registry.parse("ISO/IEC 13213")).to be_instance_of PubId::Iso::Identifier
expect(PubID::Registry.parse("IEEE/ANSI Std 484-1987")).to be_instance_of PubId::Ieee::Identifier
ronaldtse commented 2 years ago

I fully agree with this. The challenge is on what is considered an ISO vs IEEE identifier. For example,

expect(PubID::Registry.parse("ISO/IEC 13213")).to be_instance_of PubId::Iso::Identifier
expect(PubID::Registry.parse("IEEE/ANSI Std 484-1987")).to be_instance_of PubId::Ieee::Identifier

There are however some challenges. Look at these identifiers:

ISO/IEEE DIS P11073-10418 D13, January 2011
ISO/IEEE DIS P11073-10418/D15, June 2011
ISO/IEEE DIS P11073-10418_D8, July 2010
ISO/IEEE P11073-20601a/D29, July 2010
ISO/IEEE P11073-20601a/D31, August 2010

The numbers are in IEEE format (Pnnnn-{part}/D{draft}), but starts with ISO. The first 3 even uses the ISO stage code DIS. From the ISO perspective, the P and D parts do not make any sense. These identifiers can only make sense from the IEEE perspective.

mico commented 2 years ago

The numbers are in IEEE format (Pnnnn-{part}/D{draft}), but starts with ISO. The first 3 even uses the ISO stage code DIS. From the ISO perspective, the P and D parts do not make any sense. These identifiers can only make sense from the IEEE perspective.

So we should check not only the prefix but also if it's parsable by matched module. For "ISO/IEEE P11073-20601a/D31, August 2010" by splitting publishers: "ISO" and "IEEE" we can check which module can parse it first and return correct instance.

ronaldtse commented 2 years ago

For "ISO/IEEE P11073-20601a/D31, August 2010" by splitting publishers: "ISO" and "IEEE" we can check which module can parse it first and return correct instance.

Perhaps. I do not know whether we can know which parse operation is correct by simply comparing the outcomes though. Maybe the ISO version will fail once it encounters "Pnnnn"?

I'm not sure if we should handle this via parse rules where we can integrate multiple Parslet sub-rules...

mico commented 2 years ago

Perhaps. I do not know whether we can know which parse operation is correct by simply comparing the outcomes though. Maybe the ISO version will fail once it encounters "Pnnnn"?

Yes, one parser will fail. If both succeeded to parse, it means both should return correct results.

I'm not sure if we should handle this via parse rules where we can integrate multiple Parslet sub-rules...

I also like the idea to include parse rules from another gem. For example we can use "pubid-iso" Parslet rules inside "pubid-ieee" to parse ISO part for identifiers like: "ISO/IEC13210: 1994 (E) ANSI/IEEE Std 1003.3-1991" Another idea: we can define which parsers to use by datasets. For example we know "ieee-rawbib2" include ISO and IEEE formats

ronaldtse commented 2 years ago

Yeah let's try out these ideas!

mico commented 2 years ago

@ronaldtse I'm trying to parse this identifiers, for me it looks like identifier with 2 dual-PubIDs, but I believe I could be wrong. "IEEE Std 802.5r and IEEE 802.5j, 1998 Edition (ISO/IEC 8802-5:1998/Amd.1)" What is strange here it represend 2 dual-PubIDs with different format (first using " and " and second one use identifier inside brackets). What is "IEEE Std 802.5r and IEEE 802.5j" here? (maybe this could help https://ieeexplore.ieee.org/document/827772)

ronaldtse commented 2 years ago

Topic of "IEEE Std 802.5r and IEEE 802.5j" moved to #55