ufal / clarin-dspace

clarin-dspace digital repository based on DSpace and LINDAT/CLARIN DSpace
http://lindat.cz
BSD 3-Clause "New" or "Revised" License
27 stars 18 forks source link

local-types.xml files comparison 5 vs 7.2 #1048

Open milanmajchrak opened 1 year ago

milanmajchrak commented 1 year ago

I've made a comparison between the local-types.xml file from CLARIN-DSpace5 and CLARIN-DSpace7.2, because some local types are missing in the CLARIN-DSpace7.2. The full comparison with some notes is pinned below in the .xsl file. But I'd like to write down missing local-types because I have some questions:

Forgotten local-types

I think I know what to do with these local-types

I cannot find where these local-types are used. @kosarko Is necessary to have them in the CLARIN-DSpace7.2.?

Comparison file: local-types_Comparison.xlsx

kosarko commented 1 year ago

I'll add multiple comments regarding the missing fields local.files.* https://github.com/ufal/clarin-dspace/blob/8cf3758e9f12c18a2b8a20c13d63757d8ee2d52e/dspace-api/src/main/java/org/dspace/content/Item.java#L1157-L1158 https://github.com/ufal/clarin-dspace/blob/8cf3758e9f12c18a2b8a20c13d63757d8ee2d52e/dspace-xmlui/src/main/webapp/themes/UFAL/lib/xsl/aspect/artifactbrowser/item-list.xsl#L157 https://github.com/ufal/clarin-dspace/blob/8cf3758e9f12c18a2b8a20c13d63757d8ee2d52e/dspace-xmlui/src/main/webapp/themes/UFAL/lib/xsl/aspect/artifactbrowser/item-list.xsl#L164 https://github.com/ufal/clarin-dspace/blob/8cf3758e9f12c18a2b8a20c13d63757d8ee2d52e/dspace-xmlui/src/main/webapp/themes/UFALHome/lib/xsl/page-structure.xsl#L415-L432 image

I thought this was also used in one of the oai crosswalks, but that doesn't seem to be the case (metashare generates it's own count) https://github.com/ufal/clarin-dspace/blob/8cf3758e9f12c18a2b8a20c13d63757d8ee2d52e/dspace/config/crosswalks/oai/metadataFormats/metasharev2.xsl#L469

I guess these might be dropped if it doesn't complicate migration of existing records...

kosarko commented 1 year ago

local.dataProvider https://github.com/ufal/clarin-dspace/blob/8cf3758e9f12c18a2b8a20c13d63757d8ee2d52e/dspace/config/input-forms.xml#L504

dspace=# select text_value from metadatavalue where metadata_field_id in (select metadata_field_id from metadatafieldregistry where element like '%dataProvider%');

image

the production system has these:

diff --git a/dspace/config/crosswalks/oai/xoai.xml b/dspace/config/crosswalks/oai/xoai.xml
index cb18055..a0b8173 100644
--- a/dspace/config/crosswalks/oai/xoai.xml
+++ b/dspace/config/crosswalks/oai/xoai.xml
@@ -186,7 +186,14 @@
         </Filter>
         <Filter id="ExcludeItemsInComOrCol">
             <Definition>
-                <Custom ref="cmdiExcludedComsOrCols"/>
+               <And>
+                   <LeftCondition>
+                       <Custom ref="excludeDhCom"/>
+                   </LeftCondition>
+                   <RightCondition>
+                       <Custom ref="excludeTeachingCom"/>
+                   </RightCondition>
+               </And>
             </Definition>
         </Filter>

@@ -480,10 +487,17 @@
             </Configuration>
         </CustomCondition>

-        <CustomCondition id="cmdiExcludedComsOrCols">
+        <CustomCondition id="excludeDhCom">
             <Class>cz.cuni.mff.ufal.dspace.xoai.filter.ColComFilter</Class>
             <Configuration>
-                <string name="handle">XXX</string>
+                <string name="handle">20.500.12800/1</string>
+            </Configuration>
+        </CustomCondition>
+
+        <CustomCondition id="excludeTeachingCom">
+            <Class>cz.cuni.mff.ufal.dspace.xoai.filter.ColComFilter</Class>
+            <Configuration>
+                <string name="handle">11234/5011</string>
             </Configuration>
         </CustomCondition>
     </Filters>
diff --git a/dspace/config/input-forms.xml b/dspace/config/input-forms.xml
index 5d72e5a..14c9ed2 100644
--- a/dspace/config/input-forms.xml
+++ b/dspace/config/input-forms.xml
@@ -14,7 +14,8 @@
     <form-map>
         <name-map collection-handle="default" form-name="traditional"/>
         <!-- TODO real handle -->
-        <name-map collection-handle="123456789/2" form-name="clariah_submissions"/>
+        <name-map collection-handle="20.500.12800/3" form-name="clariah_submissions"/>
+        <name-map collection-handle="11234/5012" form-name="teaching_submissions"/>
     </form-map>

diff --git a/dspace/config/item-submission.xml b/dspace/config/item-submission.xml
index 6391949..0958fbd 100644
--- a/dspace/config/item-submission.xml
+++ b/dspace/config/item-submission.xml
@@ -18,6 +18,8 @@
  <!-- for handle "default".                                                -->
  <submission-map>
    <name-map collection-handle="default" submission-name="traditional" />
+   <name-map collection-handle="20.500.12800/3" submission-name="clariah" />
+   <name-map collection-handle="11234/5012" submission-name="clariah" />
  </submission-map>

the diff is against clarin-dev, so (https://github.com/ufal/clarin-dspace/compare/clarin...ufal:clarin-dspace:clarin-dev#diff-725d69ee4e9c64e6fdc5b9bf850fad524fe4ee66188eb525f8a2a88d9d866ed7 is deployed too (but dataProvider was added way before)

kosarko commented 1 year ago

local.genre not used (by us) no clue where it comes from

edit: I guess we can drop it. But maybe some other installation uses it. What does the migration tool do exactly? Will it ignore the field; will it produce some sort of error?

kosarko commented 1 year ago

local.additional.metadata this is a catchall field for the metadata we got when importing LRT and which we couldn't fit anywhere else

kosarko commented 1 year ago

local.refbox.format https://github.com/ufal/clarin-dspace/blob/8cf3758e9f12c18a2b8a20c13d63757d8ee2d52e/dspace/config/crosswalks/oai/metadataFormats/html.xsl#L31 this is filled in on imported items (nfa collection) https://github.com/ufal/lindat-repository-imports/blob/ed68ed6cead14b1ac8697381afcf5c96dd7c8bba/NFA/transformations/transform.xslt#L192-L195

and changes what's displayed in the refbox (the yellow-blue box in item view). Compare http://hdl.handle.net/11372/LRT-5118 with http://hdl.handle.net/20.500.12801/3900058-05 (the latter start with a title; the former with author names)

milanmajchrak commented 1 year ago

OK, local.files.count and local.files.size will be dropped because the file size and count of the files are counted in the FE and if that values are counted in the crosswalks I don't see a usage for it.

kosarko commented 1 year ago

local.bitstream.file is used for the preview (archive, plaintext...) feature. See around https://github.com/ufal/clarin-dspace/blob/8cf3758e9f12c18a2b8a20c13d63757d8ee2d52e/dspace-api/src/main/java/cz/cuni/mff/ufal/curation/ProcessBitstreams.java#L218 https://github.com/ufal/clarin-dspace/blob/8cf3758e9f12c18a2b8a20c13d63757d8ee2d52e/dspace-xmlui/src/main/java/org/dspace/app/xmlui/objectmanager/ItemAdapter.java#L1155

Note the ItemAdapter spits out anything in local.bitstream.* (ie. info too); this https://github.com/ufal/clarin-dspace/blob/8cf3758e9f12c18a2b8a20c13d63757d8ee2d52e/dspace-xmlui/src/main/webapp/themes/UFAL/lib/xsl/aspect/artifactbrowser/item-view.xsl#L976 displays the info in item-view, and https://github.com/ufal/clarin-dspace/blob/8cf3758e9f12c18a2b8a20c13d63757d8ee2d52e/dspace-xmlui/src/main/webapp/themes/UFAL/lib/xsl/aspect/artifactbrowser/item-view.xsl#L1057 (and around) draws the previews

local.bitstream.info as noted above the code is such that it should display also local.bitstream.info; but at the moment, no dspace object have that field in our database

dspace=# select count(*) from metadatavalue where metadata_field_id in (select metadata_field_id from metadatafieldregistry where element='bitstream' and qualifier='info');
 count
-------
     0
(1 row)
milanmajchrak commented 1 year ago

local.dataProvider

https://github.com/ufal/clarin-dspace/blob/8cf3758e9f12c18a2b8a20c13d63757d8ee2d52e/dspace/config/input-forms.xml#L504

dspace=# select text_value from metadatavalue where metadata_field_id in (select metadata_field_id from metadatafieldregistry where element like '%dataProvider%');

image

the production system has these:

diff --git a/dspace/config/crosswalks/oai/xoai.xml b/dspace/config/crosswalks/oai/xoai.xml
index cb18055..a0b8173 100644
--- a/dspace/config/crosswalks/oai/xoai.xml
+++ b/dspace/config/crosswalks/oai/xoai.xml
@@ -186,7 +186,14 @@
         </Filter>
         <Filter id="ExcludeItemsInComOrCol">
             <Definition>
-                <Custom ref="cmdiExcludedComsOrCols"/>
+               <And>
+                   <LeftCondition>
+                       <Custom ref="excludeDhCom"/>
+                   </LeftCondition>
+                   <RightCondition>
+                       <Custom ref="excludeTeachingCom"/>
+                   </RightCondition>
+               </And>
             </Definition>
         </Filter>

@@ -480,10 +487,17 @@
             </Configuration>
         </CustomCondition>

-        <CustomCondition id="cmdiExcludedComsOrCols">
+        <CustomCondition id="excludeDhCom">
             <Class>cz.cuni.mff.ufal.dspace.xoai.filter.ColComFilter</Class>
             <Configuration>
-                <string name="handle">XXX</string>
+                <string name="handle">20.500.12800/1</string>
+            </Configuration>
+        </CustomCondition>
+
+        <CustomCondition id="excludeTeachingCom">
+            <Class>cz.cuni.mff.ufal.dspace.xoai.filter.ColComFilter</Class>
+            <Configuration>
+                <string name="handle">11234/5011</string>
             </Configuration>
         </CustomCondition>
     </Filters>
diff --git a/dspace/config/input-forms.xml b/dspace/config/input-forms.xml
index 5d72e5a..14c9ed2 100644
--- a/dspace/config/input-forms.xml
+++ b/dspace/config/input-forms.xml
@@ -14,7 +14,8 @@
     <form-map>
         <name-map collection-handle="default" form-name="traditional"/>
         <!-- TODO real handle -->
-        <name-map collection-handle="123456789/2" form-name="clariah_submissions"/>
+        <name-map collection-handle="20.500.12800/3" form-name="clariah_submissions"/>
+        <name-map collection-handle="11234/5012" form-name="teaching_submissions"/>
     </form-map>

diff --git a/dspace/config/item-submission.xml b/dspace/config/item-submission.xml
index 6391949..0958fbd 100644
--- a/dspace/config/item-submission.xml
+++ b/dspace/config/item-submission.xml
@@ -18,6 +18,8 @@
  <!-- for handle "default".                                                -->
  <submission-map>
    <name-map collection-handle="default" submission-name="traditional" />
+   <name-map collection-handle="20.500.12800/3" submission-name="clariah" />
+   <name-map collection-handle="11234/5012" submission-name="clariah" />
  </submission-map>

the diff is against clarin-dev, so (clarin...ufal:clarin-dspace:clarin-dev#diff-725d69ee4e9c64e6fdc5b9bf850fad524fe4ee66188eb525f8a2a88d9d866ed7 is deployed too (but dataProvider was added way before)

Thank you for the answer, but probably I'm missing something, because I still don't understand where it is used. I could see it is used in the discovery.xml, which means there could be possible to filter by the dataProvider in the /search, but I don't see such option there.

milanmajchrak commented 1 year ago

local.genre not used (by us) no clue where it comes from

edit: I guess we can drop it. But maybe some other installation uses it. What does the migration tool do exactly? Will it ignore the field; will it produce some sort of error?

OK, we can drop it for now. Migration tool won't throw any error because of that local type.

kosarko commented 1 year ago

Thank you for the answer, but probably I'm missing something, because I still don't understand where it is used. I could see it is used in the discovery.xml, which means there could be possible to filter by the dataProvider in the /search, but I don't see such option there.

@milanmajchrak dataProvider is in the submission workflow for collection 20.500.12800/3 (see the input-forms mapping). You can check that the value is actually filled in for some items either with the search: https://lindat.mff.cuni.cz/repository/xmlui/discover?query=local.dataProvider%3A*&submit=Search&filtertype_1=title&filter_relational_operator_1=contains&filter_1=&query=local.dataProvider%3A* or in control panel metadata quality tab. We have it in the submission, as it might be useful for some future export...

Re discovery...good point...short version: at the moment, it feels like it's there for autocomplete (so that solr-dataProvider_ac works correctly). Long version: there are two beans - defaultConfiguration and searchConfiguration. searchConfiguration is used on the search page (advanced filters and facets). defaultConfiguration is used on the homepage (facets; but I don't remember the reason why we duplicate the searchFilters there. Anyway, I don't think these are listed anywhere, but you can still do https://lindat.mff.cuni.cz/repository/xmlui/discover?filtertype=dataProvider&filter_relational_operator=equals&filter=Denmarks%20SvampeAtlas). In this case I believe it's there to trigger the generation of the dataProvider_ac solr field (see https://github.com/ufal/clarin-dspace/blob/8cf3758e9f12c18a2b8a20c13d63757d8ee2d52e/dspace-api/src/main/java/cz/cuni/mff/ufal/dspace/discovery/SolrServiceTweaksPlugin.java)