klokantech / embedr

Embedr.eu - Image Embedding Service (IES) with support for IIIF, OEmbed, zoomable viewer in an iFrame
http://embedr.eu/
European Union Public License 1.1
15 stars 4 forks source link

Amazon CloudSearch storage #31

Closed klokan closed 9 years ago

klokan commented 9 years ago

We have agreed on meeting on Friday June 5th that HAWK will push the JSONs to Amazon CloudSearch after the successful finalisation of every Ingest task.

It means all the metadata for images available via the embedding and image server will be searchable with fulltext search on the defined fields.

The code of the Ingest tasks will make the push to CloudSeacrh via boto.

This is to be developed as a first task in the FINAL milestone (once we are finished with the BETA milestone tickets) - ideally finished before June 22nd.

klokan commented 9 years ago

It make sense to do this ASAP - before running large ingest test for BETA.

mzeinstra commented 9 years ago

@tomkr is wondering how the data will be put in CloudSearch, which fields will be available and which are indexed. He can use a dummy service from AWS but I believe the two projects need to agree on how the data will be stored.

tomkr commented 9 years ago

Ideally a server with some data is available for me to develop against. Do you have an ETA for that yet?

klokan commented 9 years ago

As mentioned above - we should have sample larger test data in the CloudSearch domain on June 22nd (Monday) - when ticket #35 is finished.

The Ingest API now puts the metadata into Amazon CloudSearch already.

A sample direct JSON query to the Amazon CloudSearch is: http://search-hawk-36sqxzyajg5sxa5aru5naa22aa.eu-central-1.cloudsearch.amazonaws.com/2013-01-01/search?q=test

which contains 'id' so you can get a thumbnail from IIIF by a call like: http://iiifhawk.klokantech.com/000-test1/full/,100/0/native.jpg and link to the main viewer (with oEmbed and DC tags) and support for sequences: http://embedhawk.klokantech.com/000-test1

@tomkr do you plan to call Amazon CloudSearch API from AWS JavaScript or your own server?

Docs: http://docs.aws.amazon.com/cloudsearch/latest/developerguide/searching-text.html http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/CloudSearch.html

Structure of the data follows the definition of the internal model mentioned in #30. The fields available are: screen shot 2015-06-18 at 09 40 09 Faceting is possible on all the fields (as there is less then 10 fields).

tomkr commented 9 years ago

I plan to use the CloudSearch API directly from JavaScript unless this gives issues, but that will be my starting point.