relaton / relaton-iso

RelatonIso: ISO Standards metadata using the BibliographicItem model
BSD 2-Clause "Simplified" License
2 stars 1 forks source link

title-intro instead of title-main #156

Closed opoudjis closed 10 months ago

opoudjis commented 10 months ago

The specs in metanorma-standoc have all been addressed by your update to 1.16.2, thank you!

Except:

If you fetch ISO/IEC TR 12382:1992 with the new relaton-iso, it is now populating title-intro instead of title-main.

<bibdata type="standard" schema-version="v1.2.4">
  <fetched>2023-10-21</fetched>
  <title type="title-intro" format="text/plain" language="en" script="Latn">Permuted index of the vocabulary of information technology</title>
  <title type="main" format="text/plain" language="en" script="Latn">Permuted index of the vocabulary of information technology</title>
  <title type="title-intro" format="text/plain" language="fr" script="Latn">Index permuté du vocabulaire des technologies de l'information</title>
  <title type="main" format="text/plain" language="fr" script="Latn">Index permuté du vocabulaire des technologies de l'information</title>

That is incorrect. The hierarchy is always title-main > title-intro, title-part. If only one title-* is present, it needs to be treated as title-main.

Not urgent for this release, I'll deal, but it's still an error.

opoudjis commented 10 months ago

In addition,

ISO 31-0 was fetching as:

               <title type="title-intro" format="text/plain" language="en" script="Latn">Title missing</title>
                <title type="title-main" format="text/plain" language="en" script="Latn">Legacy paper document</title>
                <title type="main" format="text/plain" language="en" script="Latn">Title missing — Legacy paper document</title>

and is now fetching as:

 <title type="title-intro" format="text/plain" language="en" script="Latn">Title missing — Legacy paper document</title>
<title type="main" format="text/plain" language="en" script="Latn">Title missing — Legacy paper document</title>

So it is not splitting intro and main any more.

And ISO 683-3 was fetching as:

                <title type="title-main" format="text/plain" language="en" script="Latn">Heat-treatable steels, alloy steels and free-cutting steels</title>
                <title type="title-part" format="text/plain" language="en" script="Latn">Part 3: Case-hardening steels</title>
                <title type="main" format="text/plain" language="en" script="Latn">Heat-treatable steels, alloy steels and free-cutting steels — Part 3: Case-hardening steels</title>

and is now fetching as:

     <title type="title-intro" format="text/plain" language="en" script="Latn">Heat-treatable steels, alloy steels and free-cutting steels</title>
     <title type="title-main" format="text/plain" language="en" script="Latn">Part 3: Case-hardening steels</title>
     <title type="main" format="text/plain" language="en" script="Latn">Heat-treatable steels, alloy steels and free-cutting steels — Part 3: Case-hardening steels</title>

So part titles are no longer being recognised.

.

opoudjis commented 10 months ago

Because ISO keeps changing its site, we really do need to test live screenscraping of ISO against a fixture, as part of gem testing. Currently, metanorma is catching these changes instead of relaton, and that is a result of poor testing technique in metanorma, keeping the fetched records fixed in the spec.

ronaldtse commented 10 months ago

@andrew2net can we fix this ASAP?

We also need to keep specs that involve ISO's live site and have them run daily so we can catch these issues before our users do.

andrew2net commented 10 months ago

Fixed in v1.16.3

$ relaton fetch 'ISO/IEC TR 12382:1992'
Resolving dependencies...
[relaton-iso] (ISO/IEC TR 12382:1992) Fetching from iso.org ...
[relaton-iso] (ISO/IEC TR 12382:1992) Found: `ISO/IEC TR 12382:1992`
<bibdata type="standard" schema-version="v1.2.4">
  <fetched>2023-10-21</fetched>
  <title type="title-main" format="text/plain" language="en" script="Latn">Permuted index of the vocabulary of information technology</title>
  <title type="main" format="text/plain" language="en" script="Latn">Permuted index of the vocabulary of information technology</title>
  <title type="title-main" format="text/plain" language="fr" script="Latn">Index permuté du vocabulaire des technologies de l'information</title>
  <title type="main" format="text/plain" language="fr" script="Latn">Index permuté du vocabulaire des technologies de l'information</title>
  ...

$ relaton fetch 'ISO 31-0'              
Resolving dependencies...
[relaton-iso] (ISO 31-0) Fetching from iso.org ...
[relaton-iso] (ISO 31-0) Found: `ISO 31-0:1974`
<bibdata type="standard" schema-version="v1.2.4">
  <fetched>2023-10-21</fetched>
  <title type="title-intro" format="text/plain" language="en" script="Latn">Title missing</title>
  <title type="title-main" format="text/plain" language="en" script="Latn">Legacy paper document</title>
  <title type="main" format="text/plain" language="en" script="Latn">Title missing - Legacy paper document</title>
  <title type="title-intro" format="text/plain" language="fr" script="Latn">Title missing</title>
  <title type="title-main" format="text/plain" language="fr" script="Latn">Legacy paper document</title>
  <title type="main" format="text/plain" language="fr" script="Latn">Title missing - Legacy paper document</title>
  ...

$ relaton fetch 'ISO 683-3'
Resolving dependencies...
[relaton-iso] (ISO 683-3) Fetching from iso.org ...
[relaton-iso] (ISO 683-3) Found: `ISO 683-3:2022`
<bibdata type="standard" schema-version="v1.2.4">
  <fetched>2023-10-21</fetched>
  <title type="title-main" format="text/plain" language="en" script="Latn">Heat-treatable steels, alloy steels and free-cutting steels</title>
  <title type="title-part" format="text/plain" language="en" script="Latn">Part 3: Case-hardening steels</title>
  <title type="main" format="text/plain" language="en" script="Latn">Heat-treatable steels, alloy steels and free-cutting steels - Part 3: Case-hardening steels</title>
  <title type="title-main" format="text/plain" language="fr" script="Latn">Aciers pour traitement thermique, aciers alliés et aciers pour décolletage</title>
  <title type="title-part" format="text/plain" language="fr" script="Latn">Partie 3: Aciers pour cémentation</title>
  <title type="main" format="text/plain" language="fr" script="Latn">Aciers pour traitement thermique, aciers alliés et aciers pour décolletage - Partie 3: Aciers pour cémentation</title>
  ...