mbraceproject / MBrace.Core

MBrace Core Libraries & Runtime Foundations
http://mbrace.io/
Apache License 2.0
211 stars 46 forks source link

Further simplify the CloudFlow producer API methods #61

Closed isaacabraham closed 9 years ago

isaacabraham commented 9 years ago

The ofCloudFileByLine and ofTextFileByLine and extremely useful (particularly the latter). However it could be made even better by allowing either a couple of overloads and / or optional parameters: -

  1. Ability to specify multiple files which are implicitly amalgamated into a single CloudFlow. e.g. ofTextFiles(string paths []). If you use this as a params argument you can simply merge this with the existing ofTextFiles producer.
  2. Ability to specify a directory, in which all files will be consumed (as in 1). You could reuse the above function but if you identify a path that ends in a / or * then assume a wildcard search (or similar). Alternative would be a secondary function but I can see use cases where you might want to combine e.g. 2 specific files plus 1 directory.
  3. Ability to simply specify 'T as a type argument without the need to specify a deserialization routine (useful for CloudFiles that were stored using the default MBrace serializer) or a built-in deserializer for JSON .NET (e.g. where each line in a text file is a JSON record).

I also think that this function (or variations of it) should be promoted as extension methods onto e.g. StoreClient on the runtime handle (or even higher), and possible on to CloudFile as well to aid discoverability - people need to know about the module and functions currently - putting them up the intellisense stack will help in this regard.

eiriktsarpalis commented 9 years ago

Thanks for the feedback.

Some of these issues have been resolved in MBrace.Core 0.9.9. Today I pushed commits 27bde0df79599cc76732162d2f746d84fb75629d and 66d5e58b2e0dbd087e7133409ae3b5275ebd11c0 which address the remaining considerations.

isaacabraham commented 9 years ago

That was quick :-) Great stuff.