This is a major revision of the dif to mmd, covering both 9 and 10 version.
The following has been implemented:
dataset_status: now both Data_Set_Progress (dif9) and Dataset_Progress (dif10) are parsed and mapped to mmd vocabulary
last_metadata_update: a better handling of creation and revision dates.
i) It now handles the possibility (as we have in many records) that neither creation nor revision dates are provided
ii) If both revision and creation are provided, they are both translated, while before the creation date was translated only if the revision was missing (closes #221 )
keywords (vocabulary="None"): this part is created now only if Keyword or Ancillary_Keyword are present. Ancillary_Keyword (dif10) was missing
dataset_language: both Data_Set_Language (dif9) and Dataset_Language (dif10) are now added and mapped to mmd vocabulary
dataset_citation: now both Data_Set_Citation (dif9) and Dataset_Citation (dif10) are parsed
dataset_citation/doi: in dif10 DOI is handled differently. Both Dataset_DOI (dif9) and Persistent_Identifier/Identifier (dif10) are handled.
iso_topic_category: better handle of information. Using mmd_vocabulary.xml a better translation of altLabel is used considering available information in harvested records. This can be improved to cover more records
iso_topic_category: if missing or not translatable a valid "Not available" is used instead of an empty element.
temporal_extent: better handling of information. i) since we do not handle Paleo_DateTime, this is skipped. It was creating invalid strings (--T12:00:00Z)
temporal_extent: Single_DateTime is now handled. Creating start and end date with the same value
temporal_extent/end_date: this is now not created if missing or empty. It was creating invalid strings (--T12:00:00Z)
geographic_extent: now it handles also dif:Geometry/dif:Point in addition to the already present dif:Geometry/dif:Polygon/dif:Boundary/dif:Point
access_constraint: this is now mapped with mmd_vocabulary.xml. It manages to map a common value of "unrestricted" to "Open" through prefLabel
Related_URL/ landing pages. This will handle both 'DATA SET LANDING PAGE' (dif10) and 'VIEW DATASET LANDING PAGE' (dif9) type. Since the landing page is already picked up from the citation Online_Resource, I use this now only if such element is empty.
personnel: extened coverage for both Contact_Person and Contact_Group (dif10).
personnel/role: I've added a few more in the list and mapped to "Technical contact" if the role is empty.
abstract lang: if only Summary instead of Summary/Abstract was used the xml:lang = "en" was missing. It is now added
This is a major revision of the dif to mmd, covering both 9 and 10 version. The following has been implemented:
dataset_status: now both Data_Set_Progress (dif9) and Dataset_Progress (dif10) are parsed and mapped to mmd vocabulary
last_metadata_update: a better handling of creation and revision dates. i) It now handles the possibility (as we have in many records) that neither creation nor revision dates are provided ii) If both revision and creation are provided, they are both translated, while before the creation date was translated only if the revision was missing (closes #221 )
keywords (vocabulary="None"): this part is created now only if Keyword or Ancillary_Keyword are present. Ancillary_Keyword (dif10) was missing
dataset_language: both Data_Set_Language (dif9) and Dataset_Language (dif10) are now added and mapped to mmd vocabulary
dataset_citation: now both Data_Set_Citation (dif9) and Dataset_Citation (dif10) are parsed
dataset_citation/doi: in dif10 DOI is handled differently. Both Dataset_DOI (dif9) and Persistent_Identifier/Identifier (dif10) are handled.
iso_topic_category: better handle of information. Using mmd_vocabulary.xml a better translation of altLabel is used considering available information in harvested records. This can be improved to cover more records
iso_topic_category: if missing or not translatable a valid "Not available" is used instead of an empty element.
temporal_extent: better handling of information. i) since we do not handle Paleo_DateTime, this is skipped. It was creating invalid strings (--T12:00:00Z)
temporal_extent: Single_DateTime is now handled. Creating start and end date with the same value
temporal_extent/end_date: this is now not created if missing or empty. It was creating invalid strings (--T12:00:00Z)
geographic_extent: now it handles also dif:Geometry/dif:Point in addition to the already present dif:Geometry/dif:Polygon/dif:Boundary/dif:Point
access_constraint: this is now mapped with mmd_vocabulary.xml. It manages to map a common value of "unrestricted" to "Open" through prefLabel
Related_URL/ landing pages. This will handle both 'DATA SET LANDING PAGE' (dif10) and 'VIEW DATASET LANDING PAGE' (dif9) type. Since the landing page is already picked up from the citation Online_Resource, I use this now only if such element is empty.
personnel: extened coverage for both Contact_Person and Contact_Group (dif10).
personnel/role: I've added a few more in the list and mapped to "Technical contact" if the role is empty.
abstract lang: if only Summary instead of Summary/Abstract was used the xml:lang = "en" was missing. It is now added