kitodo / kitodo-production

Kitodo.Production is a workflow management tool for mass digitization and is part of the Kitodo Digital Library Suite.
http://www.kitodo.org/software/kitodoproduction/
GNU General Public License v3.0
63 stars 63 forks source link

Import Configurator: Adding additional SRU parameters? #5542

Open pontus-osterdahl opened 1 year ago

pontus-osterdahl commented 1 year ago

Is your feature request related to a problem? Please describe. In the Import configuration editor the URL parameters "SRU version" and "SRU record schema" can be set for an SRU interface. Other parameters cannot be added (at least I have not found a way to do so), which was posible in Kitodo <=3.4.3 using the kitodo_opac.xml. ~For one of our SRU/SRW import sources, the parameter "maxmimRecords" is mandatory and imports not possible with the current SRU interface.~ For one our SRU/SRW import sources, import is currently not possible. Since a workaround using the CUSTOM interface was possible, we originally assumed that the problem was lacking configuration of SRU parameters, but this does not appear to be the case.

Describe the solution you'd like Unless it is a design decision to only allow "SRU version" and "SRU record schema" I would suggest to either

Describe alternatives you've considered As a workaround it is possible to use a CUSTOM interface with the necessary SRU parameters as arbitrary parameters and prefix "query=" to the search Fields.

grafik

grafik

solth commented 1 year ago

@pontus-osterdahl thanks for your suggestion. I am a little confused however. The parameter maximumRecords should actually already be added to each query to an SRU interface automatically right now. First with the value "10" (hard wired in hitlistDialog.xhtml, see https://github.com/kitodo/kitodo-production/blob/61affedc6c8f2750c0f35688896543da635ebe77/Kitodo/src/main/webapp/WEB-INF/templates/includes/processFromTemplate/dialogs/hitlistDialog.xhtml#L31) to create a hit list with 10 entries (and pagination), and then with the value "1", when a specific hit from the hit list is selected.

For example, when querying the SRU interface of Kalliope, the generated URL looks like this: http://kalliope-verbund.info/sru?recordSchema=mods37&operation=searchRetrieve&version=1.2&maximumRecords=10&query=ead.title%3DHamburg to query SRU interface by title and http://kalliope-verbund.info/sru?recordSchema=mods37&operation=searchRetrieve&version=1.2&maximumRecords=1&query=ead.id%3DDE-611-BF-73573 to fetch one specific record of the resulting hitlist by ID.

As you can see, both URLs already contain the parameter maximumRecodrs, as this is a mandatory parameter for SRU interfaces. Allowing the user to add another URL parameter with the same name would probably lead to unexpected behaviour (if, for example, the URL then contains two parameter with the same name but contradictory values) and shouldn't be necessary to begin with, because ImportConfigurations of interface type SRU - as described above - add the parameter automatically.

pontus-osterdahl commented 1 year ago

@solth Thanks for your thorough reply and clarifying that there is indeed a hardcoded MaximumRecords! Apparently I have misunderstood the cause for the incorrect import. Perhaps it is rather a configuration problem on our side.

solth commented 1 year ago

@pontus-osterdahl is the SRU interface you are trying to configure publicly available? If so, would you mind posting an example URL to one record? I could then try to configure it in Kitodo correspondingly to check if it should work with the existing features or not.

pontus-osterdahl commented 1 year ago

@solth Thank you! The interface should be publicly available: https://dmmtp20.bib-bvb.de/SRW/search/?startRecord=1&recordSchema=marcxml&recordPacking=xml&query=dc.bvbid=BV019443023&maximumRecords=1

Unless this is simply a configuration issue, the problem could possibly also have something to do with the startRecord-parameters which has to be explicitely set (I am not sure, if this is perhaps already hardcoded in Kitodo).

henning-gerhardt commented 1 year ago

According to the SRU query description the startRecord parameter is optional and should set to 1 if omitted (on the server site I guess).

solth commented 1 year ago

About the implementation in Kitodo: the parameter startRecord is also automatically set in SRU queries, but only when creating hitlists (where the combination of startRecord and maximumRecords is used to lazily load all records for the current page of the hitlist, when navigating from one page of the hitlist to another).

The creation of hitlists is skipped when directly searching via a field which has been configured as "ID search field" in the ImportConfiguration, though, because it is expected to always return at most 1 hit (ID search for partial IDs creating multiple hits is not supported currently), to avoid one redundant query to the interface.

So if dc.bvbid is configured as "ID search field" in your ImportConfiguration the generated URL will not contain the parameter startRecord.

pontus-osterdahl commented 1 year ago

After some tests with the source code it appears that the startRecord is indeed the problem. Import using the SRU interface seems to be possible after making the changes below. But if the startRecord should be set to 1 automatically on the server side, as suggested by the SRU Query Description, it is a server problem, and I am not sure if Kitodo's SRU interface should be adapted to this.

CatalogImportDialog.java

https://github.com/kitodo/kitodo-production/blob/473a2434dd8005f11831f3f40df224a8e0f3c664/Kitodo/src/main/java/org/kitodo/production/forms/createprocess/CatalogImportDialog.java#L105-L110

into

public void search() {
        try {
            if (skipHitList(hitModel.getImportConfiguration(), hitModel.getSelectedField())) {
                getRecordById(hitModel.getSearchTerm());
            } else {
                List<?> hits = hitModel.load(1, 10, null, SortOrder.ASCENDING, Collections.EMPTY_MAP);

and QueryURLImport.java

from

https://github.com/kitodo/kitodo-production/blob/473a2434dd8005f11831f3f40df224a8e0f3c664/Kitodo-Query-URL-Import/src/main/java/org/kitodo/queryurlimport/QueryURLImport.java#L259-L267

into

        SearchInterfaceType interfaceType = dataImport.getSearchInterfaceType();
        if (Objects.nonNull(interfaceType)) {
            if (Objects.nonNull(interfaceType.getStartRecordString())) {
                fullUrl = fullUrl + interfaceType.getStartRecordString() + EQUALS_OPERAND
                        + interfaceType.getDefaultStartValue() + AND;
            }
            if (Objects.nonNull(interfaceType.getMaxRecordsString())) {
                fullUrl = fullUrl + interfaceType.getMaxRecordsString() + EQUALS_OPERAND + "1&";
            }
            if (Objects.nonNull(interfaceType.getQueryString())) {
                fullUrl = fullUrl + interfaceType.getQueryString() + EQUALS_OPERAND;
            }
        }

and from

https://github.com/kitodo/kitodo-production/blob/473a2434dd8005f11831f3f40df224a8e0f3c664/Kitodo-Query-URL-Import/src/main/java/org/kitodo/queryurlimport/QueryURLImport.java#L339-L346

into

        try {
            URI queryURL = createQueryURI(dataImport, queryParameters);
            String queryString = queryURL + AND;
            if (Objects.nonNull(interfaceType)) {
                if (start > 0 && Objects.nonNull(interfaceType.getStartRecordString())) {
                    queryString += interfaceType.getStartRecordString() + EQUALS_OPERAND + start + AND;
                }
                else if (Objects.nonNull(interfaceType.getStartRecordString()) && Objects.nonNull(interfaceType.getDefaultStartValue())) {
                    queryString += interfaceType.getStartRecordString() + EQUALS_OPERAND + interfaceType.getDefaultStartValue() + AND;
                }
                if (Objects.nonNull(interfaceType.getMaxRecordsString())) {