relaton / relaton-iso

RelatonIso: ISO Standards metadata using the BibliographicItem model
BSD 2-Clause "Simplified" License
2 stars 1 forks source link

Multi-part standards handling intervening with year reference and instance link #72

Closed opoudjis closed 4 years ago

opoudjis commented 4 years ago

Hitherto, if we have accessed a date specific reference from screenscraping, and convert it into an undated reference, we preserve the original undated reference as an instance relation:

So ISO 123 is fetched as ISO 123:2001, which is converted to {ISO 123, instance: ISO 123:2001}

Until now we have suppressed instances for all_parts references, since we use a partOf relation instead.

44 means that by default, single-parted references are being called as all_parts by default. That means that by default, the instance link never appears. This breaks the test cases in metanorma-standoc. Moreover, the document identifier is not being corrected, from ISO 123:2001 to ISO 123; and the publication date is not stripped.

@andrew2net, in future, please test relaton against the test cases in metanorma-standoc. (In fact, it would be preferable to incorporate some of those tests into the relaton stack.)

opoudjis commented 4 years ago

I was going to fix this, but no, I won't. I will just suppress my test case temporarily.

So:

The fetch of "ISO 123" is meant to look like:

 <bibitem id="iso123" type="standard">
         <fetched>2019-09-27</fetched>
         <title type="title-intro" format="text/plain" language="en" script="Latn">Rubber latex</title>
         <title type="title-main" format="text/plain" language="en" script="Latn">Sampling</title>
         <title type="main" format="text/plain" language="en" script="Latn">Rubber latex – Sampling</title>
         <title type="title-intro" format="text/plain" language="fr" script="Latn">Latex de caoutchouc</title>
         <title type="title-main" format="text/plain" language="fr" script="Latn">Échantillonnage</title>
         <title type="main" format="text/plain" language="fr" script="Latn">Latex de caoutchouc – Échantillonnage</title>  
         <uri type="src">https://www.iso.org/standard/23281.html</uri>
         <uri type="obp">https://www.iso.org/obp/ui/#!iso:std:23281:en</uri>
         <uri type="rss">https://www.iso.org/contents/data/standard/02/32/23281.detail.rss</uri>
         <docidentifier type="ISO">ISO 123</docidentifier>
         <docnumber>123</docnumber>
         <contributor>
           <role type="publisher"/>
           <organization>
             <name>International Organization for Standardization</name>
             <abbreviation>ISO</abbreviation>
             <uri>www.iso.org</uri>
           </organization>
         </contributor>
         <edition>3</edition>
         <language>en</language>
         <language>fr</language>
         <script>Latn</script>
         <status>
           <stage>90</stage>
           <substage>93</substage>
         </status>
         <copyright>
           <from>2001</from>
           <owner>
             <organization>
               <name>ISO</name>
             </organization>
           </owner>
         </copyright>
         <relation type="obsoletes">
           <bibitem type="standard">
             <formattedref format="text/plain">ISO 123:1985</formattedref>
           </bibitem>
         </relation>
         <relation type="instance">
           <bibitem type="standard">
             <fetched>2019-09-27</fetched>
             <title type="title-main" format="text/plain" language="en" script="Latn">Rubber latex – Sampling</title>
             <title type="main" format="text/plain" language="en" script="Latn">Rubber latex – Sampling</title>
             <title type="title-main" format="text/plain" language="fr" script="Latn">Latex de caoutchouc – Échantillonnage</title>
             <title type="main" format="text/plain" language="fr" script="Latn">Latex de caoutchouc – Échantillonnage</title>
             <uri type="src">https://www.iso.org/standard/23281.html</uri>
             <uri type="obp">https://www.iso.org/obp/ui/#!iso:std:23281:en</uri>
             <uri type="rss">https://www.iso.org/contents/data/standard/02/32/23281.detail.rss</uri>
             <docidentifier type="ISO">ISO 123:2001</docidentifier>
             <docnumber>123</docnumber>
             <date type="published">
               <on>2001</on>
             </date>
             <contributor>
               <role type="publisher"/>
               <organization>
                 <name>International Organization for Standardization</name>
                 <abbreviation>ISO</abbreviation>
                 <uri>www.iso.org</uri>
               </organization>
             </contributor>
             <edition>3</edition>
             <language>en</language>
             <language>fr</language>
             <script>Latn</script>
             <status>
               <stage>90</stage>
               <substage>93</substage>
             </status>
             <copyright>
               <from>2001</from>
               <owner>
                 <organization>
                   <name>ISO</name>
                 </organization>
               </owner>
             </copyright>
             <relation type="obsoletes">
               <bibitem type="standard">
                 <formattedref format="text/plain">ISO 123:1985</formattedref>
               </bibitem>
             </relation>
           </bibitem>
         </relation>
       </bibitem> </references></bibliography>
       </standard-document>

Because of #44, the whole <relation type="instance"> tag is missing, and the docidentifier is incorrect as <docidentifier type="ISO">ISO 123:2001</docidentifier>.

The fix is in: lib/relaton_iso/iso_bibliography.rb

ret.to_most_recent_reference unless year || opts[:keep_year] || opts[:all_parts]

It should be

ret.to_most_recent_reference unless year || opts[:keep_year]

I was going to suggest you only preserve the instance link if the document is not actually multipart. I was also going to suggest that this is not an urgent fix.

Unfortunately, the incorrect document identifier incorporating the year, and the presence of the publication date, is a bug, and this needs to be fixed and released. The instance link strictly speaking is misleading as a relation, but keep it, it's not that big a deal. The year reference is a big deal.

I am going to suppress the test case temporarily to get my release happening, but I expect to restore it.

opoudjis commented 4 years ago

I was sorely tempted to fix this, but no. We're following process from now on, and I have too much to address with this metanorma release. But this is a bug, so please fix as soon as possible.

andrew2net commented 4 years ago

The fix is in: lib/relaton_iso/iso_bibliography.rb

ret.to_most_recent_reference unless year || opts[:keep_year] || opts[:all_parts]

It should be

ret.to_most_recent_reference unless year || opts[:keep_year]

The ret.to_most_recent_reference function makes copy of a document and adds original document as relation, but original document already has all the parts as relation. So result is a document with most recent document in a relation with parts in the relation relation. Because of it I added || opts[:all_parts] to the condition. We can't simple remove it. I'll find another way.