mbraceproject / MBrace.Core

MBrace Core Libraries & Runtime Foundations
http://mbrace.io/
Apache License 2.0
209 stars 46 forks source link

More complete HDFS / WASB style path structure #153

Closed isaacabraham closed 7 years ago

isaacabraham commented 8 years ago

There's a need to have the ability to specify the account as part of the path e.g.

"customerAccount@container/folder/folder/file.txt"
|> CloudFlow.ofFileByLine

etc. etc.

I've raised this as its own issue as it's probably an enabler for a number of scenarios.

eiriktsarpalis commented 8 years ago

MBrace.Core and by extension MBrace.Flow do not on themselves perform any type of parsing on the paths. This job is delegated to the ICloudFileStore abstraction that the current runtime happens to be using.

So I think this really is an MBrace.Azure issue: we should consider whether the concrete implementation of ICloudFileStore, BlobStore should support multiple storage accounts and recognise WASB-style paths.

If we decide to go for this approach, there are a few ramifications that might be worth considering:

  1. How will the cluster be handling key management? By design, the current implementation will never encapsulate connection strings in serialized storage objects; rather it is expected that connection strings are specified at the configuration level of each node. This happens in order to avoid inadvertent leaks of connection strings to exported serializations of object graphs, which is very easy to occur. Should the user decide to introduce a new connection string from the client side, how will that key be distributed across the cluster without worrying that leaks might happen?
  2. Issues of cluster identity: at the moment every MBrace cluster is uniquely identified by the pair of storage and service bus accounts that it uses. How could we design frictionless introduction of secondary keys without potentially blurring this identity? And how can we be sure that those secondary keys are recoverable in cases where all worker instances have died?
eiriktsarpalis commented 8 years ago

There are quite a few ways we could address these concerns: One would be maintain an "accounts" table in the master storage account which would contain all secondary connection strings. I do feel though that this may violate security expectations users may have.

eiriktsarpalis commented 8 years ago

Another would be to use the service bus to broadcast additional auth data to workers.

dsyme commented 7 years ago

See https://github.com/mbraceproject/MBrace.Azure/pull/161 which I think covers this enough for these purposes (an MBrace.Core PR may follow out of that)