Triannon app allows multiple root annotation containers in single FCrepo4 instance

ndushay commented 9 years ago

We will likely have more than one "grouping" of annotations stored in a single FCrepo4 instance, managed by one or more Triannon apps.

The "root anno container" is currently created by a rake task (or in the triannon rails console), and the URL for that code is hardcoded. This rake task / code needs NOT to have a hardcoded URL to allow multiple "root" containers for annos (e.g. a DMS grouping, a Mirador grouping, a SW grouping ...)

Note: in the rails app part of Triannon, the root anno is the "ldp_url" in config/triannon.yml. So if we are doing a single Triannon rails app per root anno container, then the rails app part is already appropriately configurable.

[x] way to specify multiple ldp root containers in triannon.yml and map them to paths in triannon urls/routes
[x] ~~generator creates routes??~~ (routes are dynamic path segments for anno_root)
[x] ~~Triannon::RootAnnotationCreator.create takes argument (path?)~~ (rake task calls LpdWriter.create_basic_container directly)
[x] rake task for each path in yml???

ndushay commented 9 years ago

Date: March 11, 2015 at 2:17:57 PM PDT
From: Rob Sanderson <azaroth@stanford.edu>
To: Erin Fahy <efahy@stanford.edu>, Naomi Dushay <ndushay@stanford.edu>, Joshua Greben <jgreben@stanford.edu>
Subject: Fedora4 / Triannon boxes

Hi Erin, Naomi, Josh,

<snip>

Also, Naomi would like to discuss an issue about naming and routes:

1.  Which is better: one box with many triannon apps, versus one box with one multi-route app, versus lots of very small boxes each with one triannon app.

2.  Whether naming in the hostname is preferred to naming in the path.  eg  dms.triannon.stanford.edu  vs triannon.stanford.edu/dms/
<snip>
Thanks!

Rob

ndushay commented 9 years ago

Erin's assessment is we may want ultimately to "horizontally scale" Triannon box and load balance it. So of these options:

have multiple triannon rails apps running on single VM for diff buckets (DMS, Mirador, SearchWorks ...)
have single triannon app manage multiple buckets with routes (e.g. triannon-stage/dms/annotations)
have separate VMs with a single triannon app for each bucket

We are going with option 2.

ndushay commented 9 years ago

@azaroth42

Had discussions with @mejackreed, @jkeck and @darrenleeweber this morning about root containers and routes. The conversation had me questioning the motivation of multiple root containers. Which of the following are motivations for multiple root containers? Have I left out any reasons?

allow retrieval of annos grouped by "dms" or "sw" or whatever (note: this could be accomplished with Solr filters)
allow write authorization of annos based on group membership and root container (note: this could potentially be accomplished with info stored in the provenance part of anno along with authenticated id and authorization mechanism?)
allow easy deletion of all annos in a group (delete the root container -- all children will be deleted) and then recreate it.

The more we talked, the less clear it was to me that we really want multiple root annotation containers, rather than keeping the info in the provenance. Please convince me otherwise. Probably by F2F conversation.

azaroth42 commented 9 years ago

Plus:

Reduce number of annotations in any single container
Allow different validation / representation based on container (e.g. SW might require information that other uses/containers do not)
Future annotation grouping requirements we don't know yet

I don't think that recording information in the annotation is a viable way forward for authorization -- you don't want to go back to every annotation just to update which individuals or groups can interact with it, you'd want to do that on a macro container level. Nor (I think) would you want to return it to the user. Further, operations on the container (rather than the individual annotations) such as POST to create a new Annotation, would become unmanageable quickly as we get further groups.

Also, what would you filter on to group by DMS vs SW? To do it properly, you would annotate the annotation with a semantic tag and then filter based on those URIs. I'm not averse to that model ... but it's more work.

ndushay commented 9 years ago

easy to address filtering by container -- SW will use Solr to look up annos, and the owning group will be used as a search filter in Solr by SW, whether it originates in the anno root container or as part of the provenance info in the individual anno. So Rob's points:

reduce number of annos in any single container
- true. Question: will it matter? Will it affect Fedora performance? It won't affect Solr perf at all.
container based validation
- if triannon is doing validation, I don't think it will matter, assuming we're validating on a POST or a PUT. If we're doing this validation as a batch process at the Fedora level, I can see it might matter ... unless we start with a list of anno ids determined by Solr.
container based representation
- I think representation will be the responsibility of triannon client consumer apps, who will be using Solr to find their annos, regardless of whether there are multiple root containers or not?

new possibly reason

batch reindexing of annos in Fedora
- multiple containers might make it easier to get Fedora to tell us "all the annos in the dms group" ... but perhaps instead we could use a (:beer:) SPARQL end point to Fedora triple store to find "all annos w provenance showing dms group"

so for me, perhaps the most compelling reason is:

easy deletion of all annos in container
- easy to do this at the Fedora level
- Solr delete by query makes this easy to do in Solr
- workaround would be: get list of ids via Solr or (:beer:) SPARQL end point and delete them one-by-one.

So basically, I'm still willing to do this, but not convinced we "need" it. @jkeck @mejackreed (and possibly @cbeer ?) and I weren't convinced that making multiple root containers made the most sense. I always feel that adding semantic info to a URI should be carefully considered - we know the UUID for the individual anno is opaque; the dns at the beginning of the anno is semantic ... the path would be more semantic info, and is in between ...

Technically, I'm sure I can figure out a way for the Triannon routes to be configured for multiple root annos -- perhaps a .yml file with the root container information and the path to be used. I can even contemplate using the same .yml file to check for/create the root annotation container.

azaroth42 commented 9 years ago

Re FCreepo's performance, it was determined in the blank node skolemization work that a single container with many many objects would reduce the performance. In that case they split up the resources in to a pair tree structure. This is the default identity management in 4.1.1. So yes, it would be valuable for performance reasons.

Re Validation, I guess that multiple routes in triannon could do different validations, and all post to the same root container. Resources in a container don't need to be consistent, so I'll give you that one.

Re representations, also true, and as per validation, they could be constructed based on the triannon route rather than a root container in Fedora.

Agree that batch reindexing is a good concern, which would be much easier with multiple root containers.

The best practice on opaque URIs is from the client's perspective, rather than the server's. The client should not be required to construct URIs (just follow them), nor infer required meaning from the URI's structure. However servers are free to manage information any way they want.

So I'm still in favor of multiple containers as a way to manage sets of annotations without needing to mess with the the representation sent from the client to add in additional features, or to construct tag annotations for each "real" annotation to do the same thing in an offset rather than inline way.

ndushay commented 9 years ago

Performance is a very convincing argument. Batch reindexing as well. Let's do it. We can always revisit later. Our vast usage data will inform future design decisions, no doubt.

ndushay commented 9 years ago

So it seems we want this:

uber anno container:

just one of these for triannon app
"anno" (e.g. http://yer-ldp-storage-url/anno or http://fedora.somewhere.org/rest/anno
LDP BasicContainer? @azaroth42 pls clarify
has anno containers as members

anno root container:

member of uber anno container
one of these per "root"
- e.g. "dms" or "sw" or "foo"
- a given root container maps to a path in triannon, e.g. http://triannon.somewhere.org/dms, http://triannon.somewhere.org/sw
LDP BasicContainer? @azaroth42 please clarify
has individual annotations as members

individual annos:

member of a specific anno root container such as "dms" or "sw" <-- only diff from now
LDP BasicContainer as a "base"
- contains triples pertaining to anno (not to a particular target or body)
the base container may have /t and /b member LDP DirectContainers for target and body triples

azaroth42 commented 9 years ago

Yes. /anno (or /annotations/ or /triannon/) should be a BasicContainer. It would have further basic containers as its members, such as dms or sw, which Triannon would POST to.

So: http://triannon.somewhere.org/annotations/dms/ for example, which contains the individual annotations.

ndushay commented 9 years ago

See #187 for more specifics on container types (when to use Basic vs. Direct)

ndushay commented 9 years ago

Fedora 4.1.1 does something diff by default vis-a-vis container creation. per @azaroth42 2015-05-08:

container	type	correct?
über-root container	Basic	:+1:
each root container (e.g. “dms”, “lib-guides”, “sw”, etc.)	Basic	:+1:
anno base container	Basic	:+1: per #198
/t and /b containers	Direct	:+1:
specific /t/xxx and /b/xxx containers	Basic	:+1: per #198

ndushay commented 9 years ago

remaining work:

[x] ldp_writer copes with mult anno containers under uber root container (checks that anno root container exists; errors if it doesn't)
[x] ldp_loader copes with mult anno containers (checks that anno root container exists; errors if it doesn't)
[x] ldp_mapper copes with mult anno containers
[x] annotation_ldp model
[x] annotations model copes with mult anno containers
- [x] #find
- [x] solr_searcher
- [x] #create
- [x] solr_writer
[x] annotations controller copes with mult anno containers (checks for recognized anno_root containers from config; errors if unrecognized)
- [x] show
- [x] create
- [x] destroy
[x] search controller copes with mult anno containers (checks for param)
[x] all specs pass
- [x] integration tests
- [x] choice
- [x] content_as_text
- [x] external uris
- [x] no body
- [x] specific resource
- [x] Solr (add anno root, too!)
- [x] features
[x] README (gem and triannon-services app)

ndushay commented 9 years ago

[x] cut gem release 2.0.0
[x] update triannon-services app
- [x] update gem to new version
- [x] README
[x] deploy
- [x] to dev
- [x] to stage

dazza-codes commented 9 years ago

specs on sul-dlss/triannon-client are failing, so the triannon server container and startup have changed and/or the RESTful API has changed. If we assume triannon is OK because all the triannon specs are OK, then the fixes must be made to the client.

azaroth42 commented 9 years ago

I don't think we should assume that the server is clean just because the tests are passing. It could be that the tests don't accurately reflect the requirements. If you can diagnose what's going wrong in the client interaction (I imagine more than just updating the endpoint URI?) would be appreciated so we can investigate. Thanks D!

ndushay commented 9 years ago

I'm about to deploy this to stage now as I've not heard that there are issues on the server side.

ndushay commented 9 years ago

I am closing this ticket, as I have now deployed to stage and dev both, including some gem updates for security reasons.

dazza-codes commented 9 years ago

@ndushay - WRT "I've not heard that there are issues on the server side" and the deployment to 'stage'. I have not yet had a chance to run the DMS ETL on the -dev server and I did not know that you wanted feedback on -dev today.

sul-dlss-deprecated / triannon

Triannon app allows multiple root annotation containers in single FCrepo4 instance #132