whole-tale / wt-design-docs

MIT License
5 stars 9 forks source link

architecture for data provider integration #5

Open mbjones opened 7 years ago

mbjones commented 7 years ago

Need overview diagram and interactions between the front-end, back-ends, and data providers. Particularly, need to know how the following components will interact:

ian-taylor commented 7 years ago

Here is a stab at connecting some of the dots:

wt-arch-ideas

Xarthisius commented 7 years ago

To expand a little bit on Data box. The way that I was imagining the role of each components:

kylechard commented 7 years ago

I'd still like to understand the use cases for iRODS a little better. If we're using an assetstore model and we need this data to be accessible to various frontends, perhaps it would be worth exploring an entirely object store-based approach to the workspace/data fabric?

Xarthisius commented 7 years ago

Well, that'd be fine by me, especially since Girder already supports S3-compatible assetstores. However, I think there are two downsides:

ian-taylor commented 7 years ago

The other question is: does ownCloud use the WT API, which then relays that to the iRODs (or other) via assetstore implementations or does it interface with iRODS directly (as we previously discussed). The former seems cleaner but it depends on the use cases, as Kyle said.

Xarthisius commented 7 years ago

@ian-taylor I think we can have both if necessary. The way I have it implemented for Filesystem assetstore in GirderFS is that:

  1. you can access files remotely using Girder's API and utilizing Girder's auth token
  2. you can access files directly if resources are available in the environment

Of course for filesystem that was fairly trivial, it's gonna be much more challenging for IRODS.

ian-taylor commented 7 years ago

Updated after comments:

wt-arch-ideas

matthewturk commented 7 years ago

@Xarthisius Not sure that it will be too much different if we use the iRODS FUSE interface; then it can be done just as a composed filesystem, right?

matthewturk commented 7 years ago

@ian-taylor this is a good start; it codifies a lot of the things we've spoken about, and has all of the items that have come up during discussion and technology identification. What we need at this point is considerably greater specific details. I think that's where the components need to be broken out, either by mechanism of interaction, by use case, or by type of technology.

A few of the specific items that need to be identified:

ian-taylor commented 7 years ago

A few quick comments. User & Authorization is supposed to be an expansion of User Management so it is part of the API. I need to clean that up in the image.

As for ORE, I agree. ORE aggregations are something we need to be thinking about once we have everything in place. It can be implemented using simple URL dereferencing on the research link and can pull from whatever we decide.

But we should think about how things will tie together i.e. how will we describe the files that researchers expose and how will we describe relationships? I personally think this needs to be independent of the physical storage e.g. use GUIDs like Gdrive does that allows files to be independent of paths/location, and use these GUIDs to link metadata in the collections. I am not sure Girder is the place to store this sort of metadata - it maybe better to separate the physical layer in Girder by using identifiers and a separate DB that describes the metadata for search. This seems to be the way DataOne works as far as I understood.

Xarthisius commented 7 years ago

@MatthewTurk I don't think IRODS' native fuse allows for selecting underlying objects. You can just export a path and you get everything with it. Nevertheless I wouldn't worry about it, wrapping icommands with GirderFS should be fairly trivial.

matthewturk commented 7 years ago

My presentation slides: https://docs.google.com/presentation/d/1a7a-jEPTTIx2Hka8fTn6DWcF_VRYrwgMYRp8pN5CllY/edit#slide=id.p