sul-dlss-deprecated / triannon

Rails engine for working with storage of OpenAnnotations stored in Fedora4
Other
13 stars 1 forks source link

Triannon app allows multiple root annotation containers in single FCrepo4 instance #132

Closed ndushay closed 9 years ago

ndushay commented 9 years ago

We will likely have more than one "grouping" of annotations stored in a single FCrepo4 instance, managed by one or more Triannon apps.

The "root anno container" is currently created by a rake task (or in the triannon rails console), and the URL for that code is hardcoded. This rake task / code needs NOT to have a hardcoded URL to allow multiple "root" containers for annos (e.g. a DMS grouping, a Mirador grouping, a SW grouping ...)

Note: in the rails app part of Triannon, the root anno is the "ldp_url" in config/triannon.yml. So if we are doing a single Triannon rails app per root anno container, then the rails app part is already appropriately configurable.

ndushay commented 9 years ago
Date: March 11, 2015 at 2:17:57 PM PDT
From: Rob Sanderson <azaroth@stanford.edu>
To: Erin Fahy <efahy@stanford.edu>, Naomi Dushay <ndushay@stanford.edu>, Joshua Greben <jgreben@stanford.edu>
Subject: Fedora4 / Triannon boxes

Hi Erin, Naomi, Josh,

<snip>

Also, Naomi would like to discuss an issue about naming and routes:

1.  Which is better: one box with many triannon apps, versus one box with one multi-route app, versus lots of very small boxes each with one triannon app.

2.  Whether naming in the hostname is preferred to naming in the path.  eg  dms.triannon.stanford.edu  vs triannon.stanford.edu/dms/
<snip>
Thanks!

Rob
ndushay commented 9 years ago

Erin's assessment is we may want ultimately to "horizontally scale" Triannon box and load balance it. So of these options:

  1. have multiple triannon rails apps running on single VM for diff buckets (DMS, Mirador, SearchWorks ...)
  2. have single triannon app manage multiple buckets with routes (e.g. triannon-stage/dms/annotations)
  3. have separate VMs with a single triannon app for each bucket

We are going with option 2.

ndushay commented 9 years ago

@azaroth42

Had discussions with @mejackreed, @jkeck and @darrenleeweber this morning about root containers and routes. The conversation had me questioning the motivation of multiple root containers. Which of the following are motivations for multiple root containers? Have I left out any reasons?

The more we talked, the less clear it was to me that we really want multiple root annotation containers, rather than keeping the info in the provenance. Please convince me otherwise. Probably by F2F conversation.

azaroth42 commented 9 years ago

Plus:

I don't think that recording information in the annotation is a viable way forward for authorization -- you don't want to go back to every annotation just to update which individuals or groups can interact with it, you'd want to do that on a macro container level. Nor (I think) would you want to return it to the user. Further, operations on the container (rather than the individual annotations) such as POST to create a new Annotation, would become unmanageable quickly as we get further groups.

Also, what would you filter on to group by DMS vs SW? To do it properly, you would annotate the annotation with a semantic tag and then filter based on those URIs. I'm not averse to that model ... but it's more work.

ndushay commented 9 years ago

easy to address filtering by container -- SW will use Solr to look up annos, and the owning group will be used as a search filter in Solr by SW, whether it originates in the anno root container or as part of the provenance info in the individual anno. So Rob's points:

new possibly reason

so for me, perhaps the most compelling reason is:

So basically, I'm still willing to do this, but not convinced we "need" it. @jkeck @mejackreed (and possibly @cbeer ?) and I weren't convinced that making multiple root containers made the most sense. I always feel that adding semantic info to a URI should be carefully considered - we know the UUID for the individual anno is opaque; the dns at the beginning of the anno is semantic ... the path would be more semantic info, and is in between ...

Technically, I'm sure I can figure out a way for the Triannon routes to be configured for multiple root annos -- perhaps a .yml file with the root container information and the path to be used. I can even contemplate using the same .yml file to check for/create the root annotation container.

azaroth42 commented 9 years ago

Re FCreepo's performance, it was determined in the blank node skolemization work that a single container with many many objects would reduce the performance. In that case they split up the resources in to a pair tree structure. This is the default identity management in 4.1.1. So yes, it would be valuable for performance reasons.

Re Validation, I guess that multiple routes in triannon could do different validations, and all post to the same root container. Resources in a container don't need to be consistent, so I'll give you that one.

Re representations, also true, and as per validation, they could be constructed based on the triannon route rather than a root container in Fedora.

Agree that batch reindexing is a good concern, which would be much easier with multiple root containers.

The best practice on opaque URIs is from the client's perspective, rather than the server's. The client should not be required to construct URIs (just follow them), nor infer required meaning from the URI's structure. However servers are free to manage information any way they want.

So I'm still in favor of multiple containers as a way to manage sets of annotations without needing to mess with the the representation sent from the client to add in additional features, or to construct tag annotations for each "real" annotation to do the same thing in an offset rather than inline way.

ndushay commented 9 years ago

Performance is a very convincing argument. Batch reindexing as well. Let's do it. We can always revisit later. Our vast usage data will inform future design decisions, no doubt.

ndushay commented 9 years ago

So it seems we want this:

uber anno container:

anno root container:

individual annos:

azaroth42 commented 9 years ago

Yes. /anno (or /annotations/ or /triannon/) should be a BasicContainer. It would have further basic containers as its members, such as dms or sw, which Triannon would POST to.

So: http://triannon.somewhere.org/annotations/dms/ for example, which contains the individual annotations.

ndushay commented 9 years ago

See #187 for more specifics on container types (when to use Basic vs. Direct)

ndushay commented 9 years ago

Fedora 4.1.1 does something diff by default vis-a-vis container creation. per @azaroth42 2015-05-08:

container type correct?
über-root container Basic :+1:
each root container (e.g. “dms”, “lib-guides”, “sw”, etc.) Basic :+1:
anno base container Basic :+1: per #198
/t and /b containers Direct :+1:
specific /t/xxx and /b/xxx containers Basic :+1: per #198
ndushay commented 9 years ago

remaining work:

ndushay commented 9 years ago
dazza-codes commented 9 years ago

specs on sul-dlss/triannon-client are failing, so the triannon server container and startup have changed and/or the RESTful API has changed. If we assume triannon is OK because all the triannon specs are OK, then the fixes must be made to the client.

azaroth42 commented 9 years ago

I don't think we should assume that the server is clean just because the tests are passing. It could be that the tests don't accurately reflect the requirements. If you can diagnose what's going wrong in the client interaction (I imagine more than just updating the endpoint URI?) would be appreciated so we can investigate. Thanks D!

ndushay commented 9 years ago

I'm about to deploy this to stage now as I've not heard that there are issues on the server side.

ndushay commented 9 years ago

I am closing this ticket, as I have now deployed to stage and dev both, including some gem updates for security reasons.

dazza-codes commented 9 years ago

@ndushay - WRT "I've not heard that there are issues on the server side" and the deployment to 'stage'. I have not yet had a chance to run the DMS ETL on the -dev server and I did not know that you wanted feedback on -dev today.