metadata101 / iso19139.ca.HNAP

ISO Harmonized North American Profile (HNAP)
GNU General Public License v2.0
4 stars 19 forks source link

Remove duplicate gmd:locale #224

Closed jodygarnett closed 3 years ago

jodygarnett commented 3 years ago

We have run into some records, which validate successfully, produce strange behaviour when published to FGP.

The records have the main language listed as a gmd:locale resulting in two gmd:locale elements. Although this is a valid record, this trips up the FGP validation and user interface.

Updating the record to remove the gmd:locale for the document primary language resolves the issue.

<gmd:locale>
      <gmd:PT_Locale id="fra">
         <gmd:languageCode>
            <gmd:LanguageCode codeList="http://nap.geogratis.gc.ca/metadata/register/napMetadataRegister.xml#IC_116"
                              codeListValue="fra">French; Français</gmd:LanguageCode>
         </gmd:languageCode>
         <gmd:country>
            <gmd:Country codeList="http://nap.geogratis.gc.ca/metadata/register/napMetadataRegister.xml#IC_117"
                         codeListValue="CAN">Canada; Canada</gmd:Country>
         </gmd:country>
         <gmd:characterEncoding>
            <gmd:MD_CharacterSetCode codeList="http://nap.geogratis.gc.ca/metadata/register/napMetadataRegister.xml#IC_95"
                                     codeListValue="RI_458">utf8; utf8</gmd:MD_CharacterSetCode>
         </gmd:characterEncoding>
      </gmd:PT_Locale>
  </gmd:locale>

  <gmd:locale>
      <gmd:PT_Locale id="eng">
         <gmd:languageCode>
            <gmd:LanguageCode codeList="http://nap.geogratis.gc.ca/metadata/register/napMetadataRegister.xml#IC_116"
                              codeListValue="eng">English; Anglais</gmd:LanguageCode>
         </gmd:languageCode>
         <gmd:country>
            <gmd:Country codeList="http://nap.geogratis.gc.ca/metadata/register/napMetadataRegister.xml#IC_117"
                         codeListValue="CAN">Canada; Canada</gmd:Country>
         </gmd:country>
         <gmd:characterEncoding>
            <gmd:MD_CharacterSetCode codeList="http://nap.geogratis.gc.ca/metadata/register/napMetadataRegister.xml#IC_95"
                                     codeListValue="RI_458">utf8; utf8</gmd:MD_CharacterSetCode>
         </gmd:characterEncoding>
      </gmd:PT_Locale>
  </gmd:locale>

Checking https://metadata101.github.io/schemas/fgp/hnap/HNAP.html has limited guidance on gmd:locale:

Guideline: NAP registered code lists based on ISO 639.2 alpha-3 terminology codes for language (eng=English, fra=French) and ISO 3166-1 alpha-3 codes (CAN=Canada) for countries shall be used to describe locale and character encoding shall be set to "utf8; utf8", as shown in the following example:

<gmd:locale>
  <gmd:PT_Locale id="fra">
    <gmd:languageCode>
      <gmd:LanguageCode codeListValue="fra" codeList="http://nap.geogratis.gc.ca/metadata/register/napMetadataRegister.xml#IC_116">
        French; Français
      </gmd:LanguageCode>
    </gmd:languageCode>
    <gmd:country>
      <gmd:Country codeListValue="CAN" codeList="http://nap.geogratis.gc.ca/metadata/register/napMetadataRegister.xml#IC_117">
        Canada; Canada
      </gmd:Country>
    </gmd:country>
    <gmd:characterEncoding>
      <gmd:MD_CharacterSetCode codeListValue="RI_458" codeList="http://nap.geogratis.gc.ca/metadata/register/napMetadataRegister.xml#IC_95">
        utf8; utf8
      </gmd:MD_CharacterSetCode>
    </gmd:characterEncoding>
  </gmd:PT_Locale>
</gmd:locale>

Discussion:

jodygarnett commented 3 years ago

@ianwallen did you have any experience with the above, I seem to recall you made an XSLT script to process records a bit for publication to FGP.

ianwallen commented 3 years ago

You may want to look into the following files

https://github.com/metadata101/iso19139.ca.HNAP/blob/3.12.x/src/main/config/conversion/import/iso19139.ca.HNAP.FGP-to-iso19139.ca.HNAP.xsl

It indicates all the changes that needed to be done during the import from FGP to HNAP.

We apply this import template when importing data from FGP.

In our case, we needed to also push data to FGP and since GN does not do a push publishing (it does pull publishing via the harvester), we had to create a separate application for publishing our metadata to FGP and that application does contain the xslt to convert the file back to FGP. (it is mostly the reverse process of the import). I can share this file with you if you need it.

If you correct this issue then the next issue you will have will be related to the following https://github.com/metadata101/iso19139.ca.HNAP/issues/157

Since GN works on the pull publishing, I believe it puts all the requirements on the server doing the harvesting to do any conversion. Newer changes /corrections to the way it formats the metadata can break the ability for older GN to harvest the records from a newer version of GN. How has GN been handling this in the past?

In your case, I'm guessing that FGP is harvesting from your application and so I believe (my personal preference) it is up to FGP to do the conversion during the import process. Otherwise I believe it puts a constraint on the HNAP where it cannot change the current output even if it makes sense to do so because it may break FGP import due to FGP running on an older version of GN.

So I guess we need to ask ourselves. Does this HNAP plugin need to produce HNAP records that can be imported by FGP or does this HNAP plugin simply need to create records that are HNAP compliant?

ianwallen commented 3 years ago

The problem also has issues that were discussed in the following https://github.com/metadata101/iso19139.ca.HNAP/issues/12

jodygarnett commented 3 years ago

Thanks Ian, that is very helpful. Sharing the xsl to reverse the process may be helpful to bulk-fix records.

I need to start up the hnap documentation guide again to capture procedures like this.

josegar74 commented 3 years ago

@ianwallen I found in the page 20 of this document https://jira.ucar.edu/secure/attachment/15227/MD-Metadata.pdf (although not very clear how official it is) the following information about the locale element:

locale - Other languages used in metadata free text descriptions.

So it seems the old way to handle this in previous versions of GeoNetwork is more "correct". The changes in https://github.com/metadata101/iso19139.ca.HNAP/pull/225, allow to keep working the metadata editor with multilingual and at the same time store in the locale elements only the alternative metadata languages.

ianwallen commented 3 years ago

@josegar74 I simply have concerns where the core geonetwork iso19139 schema is assuming that the main language will be in the Locale while we are trying to specify that HNAP is should have the main language in the Locale. ​As we are extending the iso19139, I would expect that we should only make the required changes to follow the HNAP specification. It seems like the more we try to change from the base iso19139, the more chance that we introduce conflicts and other errors.

So I think it comes down to - is the main language in the Locale element valid HNAP metadata?

If the HNAP is valid then it is possible that other system will attempt to import data into FGP which contain the main language in the Locale. In this case, the bug seems to be on FGP's implementation of HNAP implementation where it cannot handle this. So it seems like it would make more sense to apply the locale removal logic in the update-fixed-info.xsl for the FGP implementation or have FGP update the logic to support the main language in the Locale?

If HNAP specification says that the main language in the Locale is not acceptable then the bug is in the HNAP schema plugin and I agree with your suggested fix.

If the HNAP does not specify anything on this specification then I would expect that we would need to follow the ISO19139 since HNAP is based on that specification. So then the issue would become related to GN core. If ISO19139 does not support the main language in the Locale then GN core should be updated to reflect this change. And if it is valid ISO19139 then no changes are required.