Closed ndushay closed 9 years ago
Date: March 11, 2015 at 2:17:57 PM PDT
From: Rob Sanderson <azaroth@stanford.edu>
To: Erin Fahy <efahy@stanford.edu>, Naomi Dushay <ndushay@stanford.edu>, Joshua Greben <jgreben@stanford.edu>
Subject: Fedora4 / Triannon boxes
Hi Erin, Naomi, Josh,
<snip>
Also, Naomi would like to discuss an issue about naming and routes:
1. Which is better: one box with many triannon apps, versus one box with one multi-route app, versus lots of very small boxes each with one triannon app.
2. Whether naming in the hostname is preferred to naming in the path. eg dms.triannon.stanford.edu vs triannon.stanford.edu/dms/
<snip>
Thanks!
Rob
Erin's assessment is we may want ultimately to "horizontally scale" Triannon box and load balance it. So of these options:
We are going with option 2.
@azaroth42
Had discussions with @mejackreed, @jkeck and @darrenleeweber this morning about root containers and routes. The conversation had me questioning the motivation of multiple root containers. Which of the following are motivations for multiple root containers? Have I left out any reasons?
The more we talked, the less clear it was to me that we really want multiple root annotation containers, rather than keeping the info in the provenance. Please convince me otherwise. Probably by F2F conversation.
Plus:
I don't think that recording information in the annotation is a viable way forward for authorization -- you don't want to go back to every annotation just to update which individuals or groups can interact with it, you'd want to do that on a macro container level. Nor (I think) would you want to return it to the user. Further, operations on the container (rather than the individual annotations) such as POST to create a new Annotation, would become unmanageable quickly as we get further groups.
Also, what would you filter on to group by DMS vs SW? To do it properly, you would annotate the annotation with a semantic tag and then filter based on those URIs. I'm not averse to that model ... but it's more work.
easy to address filtering by container -- SW will use Solr to look up annos, and the owning group will be used as a search filter in Solr by SW, whether it originates in the anno root container or as part of the provenance info in the individual anno. So Rob's points:
new possibly reason
so for me, perhaps the most compelling reason is:
So basically, I'm still willing to do this, but not convinced we "need" it. @jkeck @mejackreed (and possibly @cbeer ?) and I weren't convinced that making multiple root containers made the most sense. I always feel that adding semantic info to a URI should be carefully considered - we know the UUID for the individual anno is opaque; the dns at the beginning of the anno is semantic ... the path would be more semantic info, and is in between ...
Technically, I'm sure I can figure out a way for the Triannon routes to be configured for multiple root annos -- perhaps a .yml file with the root container information and the path to be used. I can even contemplate using the same .yml file to check for/create the root annotation container.
Re FCreepo's performance, it was determined in the blank node skolemization work that a single container with many many objects would reduce the performance. In that case they split up the resources in to a pair tree structure. This is the default identity management in 4.1.1. So yes, it would be valuable for performance reasons.
Re Validation, I guess that multiple routes in triannon could do different validations, and all post to the same root container. Resources in a container don't need to be consistent, so I'll give you that one.
Re representations, also true, and as per validation, they could be constructed based on the triannon route rather than a root container in Fedora.
Agree that batch reindexing is a good concern, which would be much easier with multiple root containers.
The best practice on opaque URIs is from the client's perspective, rather than the server's. The client should not be required to construct URIs (just follow them), nor infer required meaning from the URI's structure. However servers are free to manage information any way they want.
So I'm still in favor of multiple containers as a way to manage sets of annotations without needing to mess with the the representation sent from the client to add in additional features, or to construct tag annotations for each "real" annotation to do the same thing in an offset rather than inline way.
Performance is a very convincing argument. Batch reindexing as well. Let's do it. We can always revisit later. Our vast usage data will inform future design decisions, no doubt.
So it seems we want this:
uber anno container:
anno root container:
individual annos:
Yes. /anno (or /annotations/ or /triannon/) should be a BasicContainer. It would have further basic containers as its members, such as dms or sw, which Triannon would POST to.
So: http://triannon.somewhere.org/annotations/dms/ for example, which contains the individual annotations.
See #187 for more specifics on container types (when to use Basic vs. Direct)
Fedora 4.1.1 does something diff by default vis-a-vis container creation. per @azaroth42 2015-05-08:
container | type | correct? |
---|---|---|
über-root container | Basic | :+1: |
each root container (e.g. “dms”, “lib-guides”, “sw”, etc.) | Basic | :+1: |
anno base container | Basic | :+1: per #198 |
/t and /b containers | Direct | :+1: |
specific /t/xxx and /b/xxx containers | Basic | :+1: per #198 |
remaining work:
specs on sul-dlss/triannon-client are failing, so the triannon server container and startup have changed and/or the RESTful API has changed. If we assume triannon is OK because all the triannon specs are OK, then the fixes must be made to the client.
I don't think we should assume that the server is clean just because the tests are passing. It could be that the tests don't accurately reflect the requirements. If you can diagnose what's going wrong in the client interaction (I imagine more than just updating the endpoint URI?) would be appreciated so we can investigate. Thanks D!
I'm about to deploy this to stage now as I've not heard that there are issues on the server side.
I am closing this ticket, as I have now deployed to stage and dev both, including some gem updates for security reasons.
@ndushay - WRT "I've not heard that there are issues on the server side" and the deployment to 'stage'. I have not yet had a chance to run the DMS ETL on the -dev server and I did not know that you wanted feedback on -dev today.
We will likely have more than one "grouping" of annotations stored in a single FCrepo4 instance, managed by one or more Triannon apps.
The "root anno container" is currently created by a rake task (or in the triannon rails console), and the URL for that code is hardcoded. This rake task / code needs NOT to have a hardcoded URL to allow multiple "root" containers for annos (e.g. a DMS grouping, a Mirador grouping, a SW grouping ...)
Note: in the rails app part of Triannon, the root anno is the "ldp_url" in config/triannon.yml. So if we are doing a single Triannon rails app per root anno container, then the rails app part is already appropriately configurable.
generator creates routes??(routes are dynamic path segments for anno_root)Triannon::RootAnnotationCreator.create takes argument (path?)(rake task calls LpdWriter.create_basic_container directly)