ufal / clarin-dspace

clarin-dspace digital repository based on DSpace and LINDAT/CLARIN DSpace
http://lindat.cz
BSD 3-Clause "New" or "Revised" License
27 stars 17 forks source link

complex/component fields #1083

Open kosarko opened 10 months ago

kosarko commented 10 months ago

view: Image

vs

Image

What about indexing? There used to be some special treatment in: https://github.com/ufal/clarin-dspace/blob/8a6ba5c98547942d7115b74cf4978e0b29ca50e4/dspace/solr/search/conf/schema.xml#L606

and

https://github.com/ufal/clarin-dspace/blame/8a6ba5c98547942d7115b74cf4978e0b29ca50e4/dspace-api/src/main/java/cz/cuni/mff/ufal/dspace/discovery/SolrServiceTweaksPlugin.java#L241

milanmajchrak commented 5 months ago

Please, where is used some value from field + "_comp"? I see it is indexed, but I cannot find where it is used except of this https://github.com/ufal/clarin-dspace/blob/clarin/dspace/solr/search/conf/schema.xml#L679

Another question: I see you have changed copyField values, should it be updated in the v7? V7 looks like this: https://github.com/dataquest-dev/DSpace/blob/dtq-dev/dspace/solr/search/conf/schema.xml#L362

kosarko commented 5 months ago

Please, where is used some value from field + "_comp"? I see it is indexed, but I cannot find where it is used except of this https://github.com/ufal/clarin-dspace/blob/clarin/dspace/solr/search/conf/schema.xml#L679

@milanmajchrak I think (https://github.com/ufal/lindat-repository-obsolete/pull/97/files, https://github.com/ufal/lindat-repository-obsolete/pull/99/files) this is there mostly for the full text search and autocompletes (those are the copy fields);

Not sure what the intention with the analysis was. It is used in: https://github.com/ufal/clarin-dspace/blob/8a6ba5c98547942d7115b74cf4978e0b29ca50e4/dspace/config/input-forms.xml#L1653 (autocomplete); and you can use them directly in search and they provide slightly different results: https://lindat.mff.cuni.cz/repository/xmlui/discover?query=local.sponsor_comp%3AeuFunds vs https://lindat.mff.cuni.cz/repository/xmlui/discover?query=local.sponsor%3AeuFunds

Another question: I see you have changed copyField values, should it be updated in the v7? V7 looks like this: https://github.com/dataquest-dev/DSpace/blob/dtq-dev/dspace/solr/search/conf/schema.xml#L362

I can see two reasons for this (but really am guessing):

  1. someone wanted to influence the relevancy/order of results
  2. we did have all the bitstream previews in local metadata (with no limits) and * would copy that; search_text would be huge...
kosarko commented 5 months ago

On the topic of complex fields...I don't think the required flag behaves as it should; the requirements are not enforced if the whole field is not mandatory (it seems).

Funding is an optional field, but if someone decides to fill it in, they should succeed only if they fill in all the inputs marked as required. Or with size info...you need both the number and the unit and that's independent on whether local.size.info is mandatory or not.