Closed mjlassila closed 7 years ago
Looks like the test instance I was working from doesn't have subcommunities. Thanks for the heads-up.
@mjlassila I'm having a hard time nailing down the details on this. New tests that I've been doing look like the call to get all the communities and their collections seem to work, but that's not looking at the entire set of data available, just checking through a bunch of examples. From my results, all the communities and collections get pulled in via the /collections
and /communities
endpoints.
Could you point me to an example of where this isn't working as desired?
Thanks for investigating this issue. I'll try to provide a bit more details here.
For an example, in our repository we have a top-level community Historical Maps https://jyx.jyu.fi/dspace/handle/123456789/6533
Inside of this community, there is one subcommunity and three collections.
The majority of items reside in the subcommunity https://jyx.jyu.fi/dspace/handle/123456789/24994
Inside of this subcommunity, there have four subcommunities and ten collections.
Each of these subcommunities might have collections inside, so for an example, City maps subcommunity https://jyx.jyu.fi/dspace/handle/123456789/20329, there are have two collections.
Currently, the call to the REST endpoind in IndexController.php retrieves only the top level of community-collection structure. In many repositories, such as ours, the community-collection structure is deeply nested. Here is our repository community-collection structure in full https://jyx.jyu.fi/dspace/community-list.
If one modifies the call in IndexController.php to include also subcommunities (expand=collections,subCommunities, the call returns topmost subcommunities and collections, but not the underlying hierarchy.
To get to the hierarchy, one must make expand=collections,subCommunities calls to every subcommunity individually. It is not sufficient to call expand=all at top community level, as it only returns the topmost subcommunities.
I put some example data available at https://www.dropbox.com/s/auy2elpb3t75edk/example-dspace-rest-data.zip?dl=1
Thanks for the detailed info, and the data too dig through.
Unfortunately, this still leaves me confused about what's going on. According to the DSpace API documentation, communities
should return all the communities, and top-communities
would return only the top level ones. That's also what I've been seeing in my latest looks at our local DSpace instance. But I'll keep exploring.
On the hierarchies, do you want to preserve the hierarchies of communities? It sounded earlier like you didn't, but just want to make sure what the desired outcome is.
It might also help to know what DSpace version you are using.
It seems that the culprit might be in our infrastructure. The DSpace instance (DSpace 6.2) I have been running my tests against, returns the data in the form I described in my previous comment. This instance has the same data as our production instance -- but our other test instance (DSpace 6.0), with toy data with deep hierarchies, communities
indeed returns all the communities!
I'll investigate whether there is a bug in DSpace 6.2 or in our data which causes this problem and report back.
DSpace 6 REST documentation didn't mention the limit
parameter, which controls the items per response, but as in DSpace 5, this parameter is in effect also in DSpace 6. Including the limit parameter with a high value to the collections
call resolved the problem. It might be better to solve this by using a low initial limit and offset, as it is done in importCollection
function, but this quick and dirty solution was good enough in our case :)
It came to my mind that if the backward compatibility to DSpace 5 REST API is not important, there is hierarchy
API endpoint available in DSpace 6 which returns a simplified representation of whole community/collection structure. Compared to communities&expand=collections
call, it is much faster
Thanks much for your digging around on this. I like your idea of using limit and offset, and will also try out the hierarchy
approach to compare. That does sound faster. I'll just want to poke around in the results a bit.
@mjlassila Thanks again. I went with the offset/limit approach, and made it configurable. It makes it slower, but it sounds like it might be more generally helpful.
Thanks! I noticed that the limit setting was missing from the import form and therefore Omeka gave an error message:
Notice: Undefined index: limit in /var/www/html/modules/DspaceConnector/src/Controller/IndexController.php on line 32
The changes needed are in https://github.com/mjlassila/DspaceConnector/commit/c9eb91cd4cf7b36e0a83517750c82ff68e8698b2
I also increased the timeout for communities?expand=collections
because the default 10 second timeout was too short even when the limit was set under 100 items. mjlassila/DspaceConnector/IndexController/L80
Those changes look good to me. Could you make them a pull request to help with the automatic checking and other management?
It is quite common that DSpace instances have deeply hierarchical community/collection structure but currently connector returns only top-level collections and communities in the DSpace installation and subcommunities are not being retrieved. DSpace REST API supports getting subcommunity information, so it would be nice if the connector could retrieve also subcommunities. Preserving community/collection hierarchy is not likely important.