opacapp / opacclient

[unmaintained] A Java client library and an Android app to access more than 1,000 public and scientific libraries from all over the world.
https://opacapp.net
MIT License
156 stars 61 forks source link

SLUB API - pending items #595

Open StefRe opened 4 years ago

StefRe commented 4 years ago

The API works (catalog and account features) but there are still some TODOs to be completed from #553:

StefRe commented 3 years ago

Regarding item 1 (get status)

I implemented it with completeable futures and I guess it's done correctly (also verifed by this small python script), but the result is prohibitively slow.

    internal fun parseSearchResults(json: JSONObject): SearchRequestResult {
        val availFutures = arrayListOf<CompletableFuture<Response>>()
        val searchresults = json.getJSONArray("docs").map<JSONObject, SearchResult> {
            SearchResult().apply {
                innerhtml = "<b>${it.getString("title")}</b><br>${it.getJSONArray("author").optString(0)}"
                it.getString("creationDate").run {
                    if (this != "null") {
                        innerhtml += "<br>(${this})"
                    }
                }
                type = mediaTypes[it.getJSONArray("format").optString(0)]
                        ?: SearchResult.MediaType.NONE
                id = "id/${it.getString("id")}".also {
                    availFutures.add(
                            asyncGet("$baseurl/$it/?type=1369315139&tx_find_find[data-format]=json-holding-status&tx_find_find[format]=data",
                                    true).whenComplete { response: Response?, _: Throwable? ->
                                if (response != null) {
                                    status = when (try {
                                        JSONObject(response.body()?.string()).getInt("status")
                                    } catch (e: Exception) {
                                        0
                                    }) {
                                        1 -> SearchResult.Status.GREEN
                                        2 -> SearchResult.Status.YELLOW
                                        3 -> SearchResult.Status.RED
                                        else -> SearchResult.Status.UNKNOWN
                                    }
                                } else  {
                                    status = SearchResult.Status.UNKNOWN
                                }
                            })
                }
            }
        }
        CompletableFuture.allOf(*availFutures.toTypedArray()).join()
        return SearchRequestResult(searchresults, json.getInt("numFound"), 1)
    }

Even with 1000 Mbps link speed it takes almost 10 seconds for the search results to show up. It seems that there's nothing we can do to speed this up as the web catalog needs roughly the same time as the app:

Web catalog: (20 status queries) get_status_web_timings

App: (the search query itself followed by the 20 status queries) get_status_app_timings

The queries appear to be blocked on the server side. For the web catalog, this looks OK as you see the search results after 1-2 seconds and then the pending status icons get updated to , or within the next 8 seconds (see example). The app, however, waits for all the status queries be completed and only then shows the results including status markers.

As far as I understand it's not possible in the app to update the status markers after the results are shown. If so I don't want to include the availability status in the app as it simply takes way too long to get the search results.

StefRe commented 3 years ago

Regarding item 2 (references)

References are mostly links to other SLUB catalog entries, typically references to differently named preceding or following editions of periodical items or references to online/print editions of print/online items. Right now we just show the text without links as links would open the web catalog in the browser: https://github.com/opacapp/opacclient/blob/4b65dbaddd65b44e01518c395fc43e7b993edd37/opacclient/libopac/src/main/java/de/geeksfactory/opacclient/apis/SLUB.kt#L264-L267

Example: https://katalog.slub-dresden.de/id/0-130446319/#detail: Web catalog: grafik

JSON from API: https://github.com/opacapp/opacclient/blob/4b65dbaddd65b44e01518c395fc43e7b993edd37/opacclient/libopac/src/test/resources/slub/search/item-copies_in_multiple_arrays.json#L39-L45

Display in app: grafik

These libero links can be resolved to rsnlinks and these in turn to idlinks, i.e. we could get a DetailedItem from them. Is there any means in the app to show this DetailedItem in a new detailed item view in the app? If not I'd remove this TODO.

johan12345 commented 3 years ago

Your implementation for the status looks correct to me.

As far as I understand it's not possible in the app to update the status markers after the results are shown. If so I don't want to include the availability status in the app as it simply takes way too long to get the search results.

Yes, unfortunately this is not possible yet. We have also thought about that, as it would also be helpful to speed up some of the other APIs as well. I have put some ideas regarding this into issue #607.

Is there any means in the app to show this DetailedItem in a new detailed item view in the app?

Well, it might be possible to put these references into the DetailedItem as volumes, so that they can link to other SLUB catalog pages. Do you think that would be helpful? In that case, the links of course need to be resolved into rsn links first, or getResultById might be adapted to also support slubdd.de links.

StefRe commented 3 years ago

Well, it might be possible to put these references into the DetailedItem as volumes, so that they can link to other SLUB catalog pages.

Yes, this could be a workaround. The only drawback is that the references will appear under the subheading "Volumes" or "Enthaltene Werke" (from SearchResultDetailFragment.java I see that we can have both volumes and copies at the same time, which sounds a bit illogical in general but comes in handy in this case). On the other hand it seems relatively easy to add another subheading "References" or "Verweise" to SearchResultDetailFragment, set up a ReferencesAdapter and add a referencedItems list to DetailedItem.
Maybe SLUB is (currently) the only API to use referenced items but if you find this a good idea I could make a PR for it. If not I'll use the volumes workaround (maybe we could rename the volumes subheading to "Volumes / References" or "Enthaltene Werke / Verweise" in this case). So it's up to you, I'm happy with both solutions.

johan12345 commented 3 years ago

Hm, both would solutions be fine for me. @raphaelm, any preference?

raphaelm commented 3 years ago

I'm not a huge fan on adding another list to DetailedItem, since it complicates both the API surface as well as the user interface implementation significantly.

Given the examples @StefRe showed, I think using Detail instances for this is not that wrong, Vorgänger: August-Thyssen-Hütte: Thyssen-Forschung as a detail doesn't look like a mis-use of the details for me. The only problem is that the link opens in the browser, not in the app, so maybe we should approach the problem there?

One possible solution would be to extend the Detail class with a new property, either a boolean specifying that the links in the HTML content are "internal", or new property for the link target stored outside of HTML. In either case, getDetailById would need to be able to take an URL as a parameter instead of an ID, or we add a getDetailByUrl.

Thinking this further, we could even add a boolean OpacApi.isDetailUrl(url) method that we run every time a user clicks any link, and if it returns true, we open the link as a result detail in-app. Extending that, an boolean OpacApi.isSearchUrl(url) could then be implemented to make e.g. linked author names or category names work, as can be seen in many OPACs.

Instead of adding interface complexity for something that (so far) only SLUB requires, we'd add a similar amount of complexity but get a positive effect almost everywhere.

@StefRe @johan12345 What are your thoughts on this approach?

johan12345 commented 3 years ago

Thinking this further, we could even add a boolean OpacApi.isDetailUrl(url) method that we run every time a user clicks any link, and if it returns true, we open the link as a result detail in-app. Extending that, an boolean OpacApi.isSearchUrl(url) could then be implemented to make e.g. linked author names or category names work, as can be seen in many OPACs.

Hm, indeed this sounds like a good and more general solution. It would also be possible to simply add an item ID as an additional property of the Detail class, but the abovementhoed approach has the additional advantage that it would work even if there are multiple links within one detail.

StefRe commented 3 years ago

I think using Detail instances for this is not that wrong

yes, all (SLUB target) references are references to details, see 1st paragraph of this comment. In fact, SLUB terminology differentiates between links which refer to external resources and references which refer to SLUB catalog items.

One possible solution would be to extend the Detail class with a new property, either a boolean specifying that the links in the HTML content are "internal", or new property for the link target stored outside of HTML.... Thinking this further, we could even add a boolean OpacApi.isDetailUrl(url) method that we run every time a user clicks any link...

This sounds great but I guess it'll be a rather big change in the frontend code. So for the time being I'm going to use the volumes workaround. Maybe it makes sense to file a separate issue for this proposed new link handling to keep things clear.

Btw, I noticed that there's a boolean html property in Details that seems to be used nowhere.

StefRe commented 3 years ago

I implemented getResultById for libero links (see above) by getting the final url from a redirecting HEAD request like httpHead("$id",false).request().url().toString(). This throws the following error:

java.net.ProtocolException: Too many follow-up requests: 21
    at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:176)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
    at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
    at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257)
    at okhttp3.RealCall.execute(RealCall.java:93)
    at de.geeksfactory.opacclient.apis.OkHttpBaseApi.httpHead(OkHttpBaseApi.java:294)
    at de.geeksfactory.opacclient.apis.SLUB.getResultById(SLUB.kt:180)
 ... etc.

The reason for this is that cleanUrl assumes the query component to be exclusively in the form of key=value pairs and adds an extra = after the key if no value is present: https://github.com/opacapp/opacclient/blob/2040420f50dc579b81f012b9fbdb331f605a5ef4/opacclient/libopac/src/main/java/de/geeksfactory/opacclient/apis/BaseApi.java#L486-L489

According to RFC3986 this is by no means obligatory. An example of a SLUB reference is http://slubdd.de/katalog?libero_mab21364775. It gets redirected as follows:

http://slubdd.de/katalog/?libero_mab21364775
http://primoproxybeta.slub-dresden.de/cgi-bin/permalink.pl?libero_mab21364775
http://katalogbeta.slub-dresden.de/rsn/1364775
https://katalogbeta.slub-dresden.de/rsn/1364775
https://katalog.slub-dresden.de/rsn/1364775
https://katalog.slub-dresden.de/rsn/1364775/
https://katalog.slub-dresden.de/id/0-130446319/

The additional = at the end leads to endless redirects at the second url.

@raphaelm

(EDIT: the following is no longer valid, see next comment below... I think the best solution is to check between the following two lines whether the string pair contains any = and do the splitting into kv parts only if this is the case: Do you see any principal issues with this approach (I'm just asking because test coverage is rather sparse so maybe you can identify potential issues by your experience)?)

Further I noticed that the test for cleanUrl https://github.com/opacapp/opacclient/blob/2040420f50dc579b81f012b9fbdb331f605a5ef4/opacclient/libopac/src/test/java/de/geeksfactory/opacclient/apis/ApacheBaseApiTest.java#L15 is in ApacheBaseApiTest.java rather than in BaseApiTest.java (cleanUrl is a method of BaseApi and not ApacheBaseApi). I guess this is for historical reasons. Is it OK if I move this test to its proper class?

StefRe commented 3 years ago

@raphaelm Sorry to have bothered you with this - I found the bug: the second parameter must be null instead of "" here: https://github.com/opacapp/opacclient/blob/2040420f50dc579b81f012b9fbdb331f605a5ef4/opacclient/libopac/src/main/java/de/geeksfactory/opacclient/apis/BaseApi.java#L487-L488

I wrote a test for it and fixed it. So the only remaining question is if the tests for cleanUrl should be moved to a BaseApiTest class?

raphaelm commented 3 years ago

Ah, nice! Probably something we've just never encountered before.

I wrote a test for it and fixed it. So the only remaining question is if the tests for cleanUrl should be moved to a BaseApiTest class?

Sure, go ahead!