omeka-s-modules / Osii

Import items from other Omeka S installations
GNU General Public License v3.0
2 stars 0 forks source link

Testing: Omeka S Item Importer #1

Closed jimsafley closed 2 years ago

jimsafley commented 2 years ago

The Omeka S Item Importer (OSII) module allows administrators to import items and their related resources from other Omeka S installations.

Once installed, click on the "Omeka S Item Importer" link under the "Modules" section of the sidebar. Then click on the "Add import" button on the top right. This form is where you set remote and local configurations. "Remote" in this case means the remote Omeka S installation—the one from which you want to import items. "Local" in this case means the local Omeka S installation—the one you're currently working in now.

You should test importing from multiple remote installations. Our Directory features many examples that contain richly described items.

Remote configuration

Root endpoint

To find the "Root endpoint" you'll need to navigate to the installation and replace the part of the URL that includes and comes after the "/s/" with "/api". For example, change this URL:

http://afamaidshist.fiu.edu/omeka-s/s/african-american-aids-history-project/page/home-page

To this URL:

http://afamaidshist.fiu.edu/omeka-s/api

When the page loads, you should see something like this:

{"errors":{"error":"The API request resource must be a string. Type \u0022NULL\u0022 given."}}

Even though it's an error, this URL is the root endpoint. Note that a few remote installations make it impossible to reach the root endpoint. If you can't reach the "/api" URL, just go on to the next installation.

Query

You have the option to include a "Query" to filter the items to be imported. If you don't provide a query, the module will import all items from the remote installation. One way to get a query is by performing an advanced search in a site, like this one:

http://afamaidshist.fiu.edu/omeka-s/s/african-american-aids-history-project/item/search

For example, to get all items that have a title that contains "aids", the URL would be:

http://afamaidshist.fiu.edu/omeka-s/s/african-american-aids-history-project/item?property%5B0%5D%5Bjoiner%5D=and&property%5B0%5D%5Bproperty%5D=1&property%5B0%5D%5Btype%5D=in&property%5B0%5D%5Btext%5D=aids

And the query string would be everything after the question mark (?):

property%5B0%5D%5Bjoiner%5D=and&property%5B0%5D%5Bproperty%5D=1&property%5B0%5D%5Btype%5D=in&property%5B0%5D%5Btext%5D=aids

Note that this query string does not include the site_id parameter because it's implied by the context. So, the total number of items in the import may be larger than the total number of items in the advanced search results. Nevertheless, in this example, every item in the import should have titles that contain the string "aids".

Key identity / Key credential

If you want to import resources that are marked as private, you'll need to provide a "Key identity" and "Key credential". The only way to obtain these strings is by having an account on the remote installation or by asking someone who has an account to provide them. Obviously, you only want to ask people who know and trust you.

A better alternative is to import directly from the installation that you're testing on. It's not useful in most cases, but for testing you can point the "Root endpoint" to the root endpoint of your local installation. This lets you have complete control of the resources you're importing.

To test this, create an item and mark it as private. Add a new import, set the "Root endpoint" to your local installation's root endpoint, and set the "Query" as id=<item_id>, replacing <item_id> with the ID of your newly created item. At first, do not set a "Key identity" and "Key credential". Running an import should result in zero items. Now set the identity and credential (which you can generate when editing your user account). The import should now result in one item, marked as private.

Local configuration

Import label

You must label your import.

Item set

Select the item set to which imported items will be assigned. Be sure to test whether items are in fact assigned to the item set.

Exclude media

Check this if you want the import to exclude media. If not checked, media will be imported as normal. Note that, after switching this option, you must take a snapshot and then import or changes will not take effect.

Be sure to test whether media are imported or not imported depending on this configuration.

Exclude item sets

Check this if you want the import to exclude item sets. If not checked, item sets will be imported as normal. Note that, after switching this option, you must take a snapshot and then import or changes will not take effect.

Be sure to test whether item sets are imported or not imported depending on this configuration.

Keep removed resources

Check this if, during import, you want to keep local resources that were removed from the remote snapshot. If checked, removed resources will remain locally but will no longer be managed by this import. If not checked, removed resources will be deleted locally as normal.

Be sure to test whether resources are removed or not removed depending on this configuration. You'll need to import directly from the installation that you're testing on, so you can remove media and item sets at will.

Add remote site URL

Enter the URL to the site from which the imported resources are derived. If entered, this will be added to every imported resource, saved as a value using property osii:source_site.

Be sure to test whether the remote site URL is added to each imported item.

Add remote resource URL

Check this if you want to add the remote resource's canonical URL to every imported resource, saved as a value using property osii:source_resource.

Be sure to test whether a remote resource URL is added to each imported item.

Manage import

After configuring the import, click on the "Submit" button. This will take you to the "Manage import" page, where you should see "Prepare import is not available now" on the main panel, and a sidebar containing "Import actions" and "Import metadata". The "Import metadata" section contains the remote and local configurations you just saved. The "Import actions" section contains most of the actions you'll need to successfully run and troubleshoot an import. Note that actions will only appear when active, and that you must take a snapshot and prepare the import before you can import a snapshot. Here are the actions that you can take:

Take snapshot

Click this (and the subsequent confirmation button) to begin the snapshot process. This will take a snapshot of the remote resources in their current state. This will gather the data needed to import the resources and reconcile the local installation with the remote one.

After the status is "Completed" the main panel should contain quite a bit of metadata about the snapshot. Use this to reconcile differences between the remote and local installations, to the extent desired. The sections are:

After mapping the data types (and possibly the templates), click on the "Submit" button at the top right of the page. Once you've done this, you should be able to import the snapshot.

Stop snapshot

Click this (and the subsequent confirmation button) to stop the snapshot process. Once clicked, the status should be "Stopping". Click refresh until the status is "Stopped".

Import snapshot

Click this (and the subsequent confirmation button) to import the current snapshot. This will add new resources and update existing resources. Before you confirm, be sure to submit mapping changes and reconcile differences between the remote and local installations, to the extent desired.

Stop import

Click this (and the subsequent confirmation button) to stop the import process. Once clicked, the status should be "Stopping". Click refresh until the status is "Stopped".

Refresh status

Click this to refresh the status of the current process. This will update the status in the event that it has changed since the last refresh.

View job

Click this to view the job that is running the current process. Because of the potentially long duration of these processes, they have to be run on the server in the background. The job represents this background process. On the job page, you may click on "view log" to view the current state of the process. This is helpful when troubleshooting snapshots and imports that result in an error.

View items / media / item sets

Click these to view imported resources. This is the best way to verify that the import went as expected and all data came over as expected.

katknow commented 2 years ago

Was able to successfully get up to the import snapshot step, then got this error:

ParseError syntax error, unexpected ')'

Details:

ParseError: syntax error, unexpected ')' in /var/www/html/katknow/OmekaSTesting/modules/Osii/src/Job/DoImport.php:229 Stack trace:

0 /var/www/html/katknow/OmekaSTesting/vendor/laminas/laminas-loader/src/StandardAutoloader.php(215): Laminas\Loader\StandardAutoloader->loadClass('Osii\Job\DoImpo...', 'namespaces')

1 [internal function]: Laminas\Loader\StandardAutoloader->autoload('Osii\Job\DoImpo...')

2 [internal function]: spl_autoload_call('Osii\Job\DoImpo...')

3 /var/www/html/katknow/OmekaSTesting/application/src/Job/Dispatcher.php(72): class_exists('Osii\Job\DoImpo...')

4 /var/www/html/katknow/OmekaSTesting/modules/Osii/src/Controller/Admin/ImportController.php(167): Omeka\Job\Dispatcher->dispatch('Osii\Job\DoImpo...', Array)

5 /var/www/html/katknow/OmekaSTesting/vendor/laminas/laminas-mvc/src/Controller/AbstractActionController.php(77): Osii\Controller\Admin\ImportController->doImportAction()

6 /var/www/html/katknow/OmekaSTesting/vendor/laminas/laminas-eventmanager/src/EventManager.php(321): Laminas\Mvc\Controller\AbstractActionController->onDispatch(Object(Laminas\Mvc\MvcEvent))

7 /var/www/html/katknow/OmekaSTesting/vendor/laminas/laminas-eventmanager/src/EventManager.php(178): Laminas\EventManager\EventManager->triggerListeners(Object(Laminas\Mvc\MvcEvent), Object(Closure))

8 /var/www/html/katknow/OmekaSTesting/vendor/laminas/laminas-mvc/src/Controller/AbstractController.php(105): Laminas\EventManager\EventManager->triggerEventUntil(Object(Closure), Object(Laminas\Mvc\MvcEvent))

9 /var/www/html/katknow/OmekaSTesting/vendor/laminas/laminas-mvc/src/DispatchListener.php(139): Laminas\Mvc\Controller\AbstractController->dispatch(Object(Laminas\Http\PhpEnvironment\Request), Object(Laminas\Http\PhpEnvironment\Response))

10 /var/www/html/katknow/OmekaSTesting/vendor/laminas/laminas-eventmanager/src/EventManager.php(321): Laminas\Mvc\DispatchListener->onDispatch(Object(Laminas\Mvc\MvcEvent))

11 /var/www/html/katknow/OmekaSTesting/vendor/laminas/laminas-eventmanager/src/EventManager.php(178): Laminas\EventManager\EventManager->triggerListeners(Object(Laminas\Mvc\MvcEvent), Object(Closure))

12 /var/www/html/katknow/OmekaSTesting/vendor/laminas/laminas-mvc/src/Application.php(331): Laminas\EventManager\EventManager->triggerEventUntil(Object(Closure), Object(Laminas\Mvc\MvcEvent))

13 /var/www/html/katknow/OmekaSTesting/index.php(21): Laminas\Mvc\Application->run()

14 {main}

jimsafley commented 2 years ago

I can't reproduce the error. A parse error would present itself under any circumstance. So, if we're sharing the same codebase, I should be seeing the same thing you are. Are you sure you're on the most recent commit? Have you (or someone else) made any changes to the code?

katknow commented 2 years ago

I believe I'm on the most recent commit, just re-pulled now and still getting it. I also uninstalled it and reinstalled it. I haven't made any changes to the code.

jimsafley commented 2 years ago

Could you open the modules/Osii/src/Job/DoImport.php and paste line 229 and the surrounding, say, 20 lines?

jimsafley commented 2 years ago

Oh, never mind. I know what's going on. I'll fix it promptly.

katknow commented 2 years ago

Okay, thank you!

katknow commented 2 years ago

I'll start experimenting with all the different nuances and use different sites, but the base level import definitely works!

katknow commented 2 years ago

So, I've tried this with a few different sites now and it seems like I'm having trouble importing the metadata so I want to be sure I'm understanding things correctly. As long as I am able to map all the different fields after taking the snapshot, the metadata should come over? I'm able to get the media to import and the metadata to include remote site & remote resource URLS and the media type but nothing else.

jimsafley commented 2 years ago

Yes, after you map remote data types to local data types and submit the changes, the module should import the values—as long as the local data type recognizes the remote value structure. What root endpoint are you using? Take a screenshot of the "Data types" tab and post here.

katknow commented 2 years ago

This is what I'm inputting for data types:

Screenshot 2022-01-13 at 09-26-09 Fighting Words Franklin · Manage import · Omeka S Item Importer · Katie's Omeka S Testing

I will note that I was able to import the metadata from the example site you provided above, it just took a long while, so I'm wondering if maybe its an issue with the other sites I chose to test?

This is what the items look like after completing import:

Screenshot 2022-01-13 at 09-29-24  Untitled  · Items · Katie's Omeka S Testing Site

jimsafley commented 2 years ago

Did you click on the "Submit" button on the top right of the page before clicking "Import snapshot"?

katknow commented 2 years ago

I did.

katknow commented 2 years ago

I switched over to start testing on my Mac and it works fine and much faster there so I'm thinking it's a problem on my end with my PC.

jimsafley commented 2 years ago

I'm currently troubleshooting the problem. In the meantime, it's important to note that snapshot and import jobs may take a long time to complete—it really depends on the amount of resources you're importing, the size of the media you're importing, and connection speed/stability. So, for the larger installations, expect long waits and periodic connectivity issues. For instance, the "J. Willard Marriott Library Digital Exhibits" that you're working on contains 63,000 items and 3700 media. It takes a long time to take a snapshot of every resource and download every media.

katknow commented 2 years ago

Overall this seems to work really well and I was able to import from quite a few websites. Stopping worked fine as well, and I was able to add and remove source resource and source site no problem. Refresh status, stopping, etc. all work. I started trying to import a snapshot to test the key identity/key credential, but it's been going since 3:30 PM and as of 11 PM it still has not completed. Hopefully, it will be completed tomorrow to see if it worked.

jimsafley commented 2 years ago

8 hours is a bit excessive. What root endpoint are you using?

katknow commented 2 years ago

I had just plugged in this: http://dev.omeka.org/katknow/OmekaSTesting/api derived from this: http://dev.omeka.org/katknow/OmekaSTesting/s/test123/page/welcome following the pattern, but I just visited and there's a menu that wasn't on the others. Is there another variation of the url I should be using?

jimsafley commented 2 years ago

I successfully imported from that root endpoint. It took about an hour. Click "View job" then "view log". What are the last dozen or so lines in the log? Also, I made some changes to the module, so go ahead and pull them into your local installation.

katknow commented 2 years ago

Changes have been pulled, and here's the end of the log: Errors: { "download": [ "Error downloading http:\/\/dev.omeka.org\/katknow\/OmekaSTesting\/files\/original\/52bf0511682919b744c84e2e0a366bfcafa6f3cd.jpeg: Unable to connect to dev.omeka.org:80 . Error #0: stream_socket_client(): unable to connect to dev.omeka.org:80 (php_network_getaddresses: getaddrinfo failed: System error)" ] } Stack trace:

0 /var/www/html/katknow/OmekaSTesting/application/src/Api/Adapter/AbstractEntityAdapter.php(318): Omeka\Api\Adapter\AbstractEntityAdapter->hydrateEntity(Object(Omeka\Api\Request), Object(Omeka\Entity\Media), Object(Omeka\Stdlib\ErrorStore))

1 /var/www/html/katknow/OmekaSTesting/application/src/Api/Manager.php(224): Omeka\Api\Adapter\AbstractEntityAdapter->create(Object(Omeka\Api\Request))

2 /var/www/html/katknow/OmekaSTesting/application/src/Api/Manager.php(78): Omeka\Api\Manager->execute(Object(Omeka\Api\Request))

3 /var/www/html/katknow/OmekaSTesting/modules/Osii/src/Job/DoImport.php(248): Omeka\Api\Manager->create('media', Array, Array, Array)

4 /var/www/html/katknow/OmekaSTesting/application/src/Job/DispatchStrategy/Synchronous.php(34): Osii\Job\DoImport->perform()

5 /var/www/html/katknow/OmekaSTesting/application/src/Job/Dispatcher.php(105): Omeka\Job\DispatchStrategy\Synchronous->send(Object(Omeka\Entity\Job))

6 /var/www/html/katknow/OmekaSTesting/application/data/scripts/perform-job.php(66): Omeka\Job\Dispatcher->send(Object(Omeka\Entity\Job), Object(Omeka\Job\DispatchStrategy\Synchronous))

7 {main}

katknow commented 2 years ago

Key identity & credential worked on most recent import attempt. Was also able to exclude item sets and media. I think it might be good to go!

jimsafley commented 2 years ago

Here's several more things to test, if you haven't already. Remember that you will have more control over the resources by importing directly from your local installation.

katknow commented 2 years ago

Checked everything on the list and it does seem to work! Primarily tested via importing from my own installation, particularly for the latter half.

sharonmleon commented 2 years ago

I've also done a thorough test of this with some fairly complex data. All looks good and I'm going to close the issue.