mbraceproject / MBrace.StarterKit

A collection of demos and tutorials for MBrace
http://mbrace.io
57 stars 34 forks source link

Thoughts on the iterative development process for MBrace #95

Open dsyme opened 7 years ago

dsyme commented 7 years ago

The process of making small incremental fixes to the MBrace components is laborious and requires very deep knowledge.

At the logical level, the architecture of the MBrace components is fantastic - MBrace.Core is clean, Vagabond and FsPickler are general, reusable components etc. However trialling a fix in a base component "for real" with an actual cluster is a right PITA.

This is a particular problem for MBrace because

  1. user-level problems are often related to code or data serialization and often involve fixes to Vagabond or FsPickler or Mono.Cecil.
  2. users are not normally going to be professional software developers who love packages and components etc. - they are going to be people doing domain-specific data scripting, machine learning, compute etc. They may make some fixes, and develop new features, but they are not pro devs.

The rest of this is just me thinking out loud about this issue. TBH it's a more general issue for any collection of related projects and affects any component development scenarios where components become nuget packages.

The relevant stack of components is acquired like this (with additional dependencies on .NET, Mono.Cecil and a few others)

    git clone https://github.com/nessos/Streams
    git clone https://github.com/nessos/Thespian
    git clone https://github.com/mbraceproject/FsPickler
    git clone https://github.com/mbraceproject/Vagabond
    git clone https://github.com/mbraceproject/MBrace.Core
    git clone https://github.com/mbraceproject/MBrace.Azure
    git clone https://github.com/mbraceproject/MBrace.AWS
    git clone https://github.com/mbraceproject/MBrace.StarterKit

The end packages "consumed and deployed" by MBrace.StarterKit are, roughly speaking, the contents of packages\MBrace.Azure\tools , e.g.

#r "tools/Newtonsoft.Json.dll"
...
#r "tools/MBrace.Azure.dll"
AzureWorker.LocalExecutable <- Path.Combine(__SOURCE_DIRECTORY__, "tools/mbrace.azureworker.exe")

What I'm trying to understand is what a good "inner development loop" would look like, where users can make changes to any/all of the components above, and be trialling the updated versions in their cluster data scripting ASAP.

Here are some thoughts

  1. Paket has a paket.local file which can be useful. However using it effectively still feels very painful, the development loop would be something like this below.
    cd Vagabond; build
    cd MBrace.Core
       .paket\paket.exe update
       fiddle with paket.dependencies until the new local Vagabond package is picked up
       build

    cd MBrace.Azure; 
       .paket\paket.exe update
       fiddle with paket.dependencies until the new local MBrace.Core package is picked up
       build

    cd MBrace.AWS
       .paket\paket.exe update
       fiddle with paket.dependencies until the new local MBrace.Core package is picked up
       build

    cd MBrace.StarterKit
       .paket\paket.exe update
       fiddle with paket.dependencies until the new local MBrace.Azure and MBrace.AWS package is picked up

    recreate cluster using the new packages
  1. We could hack a new feature into Paket so that you could optionally pick up DLLs from projects rather than packages. Then we create a big "MBrace.sln" (or MBrace.StarterKit.dv.sln) that has all the projects.

  2. We could use git submodules, I don't want to go there

  3. We could use homebrew scripts of copy-the-DLL-into-the-tools-directory-and-adjust-the-binding-redirects

  4. We could collapse everything to one big project. But that would surely be wrong

  5. We could take the FAKE and Paket approach which pushes very many alpha packages to nuget.org on every fix. However this only works for those who have permission to publish, and still requires a lot of fiddling with paket.dependencies to pick up new versions

dsyme commented 7 years ago

Basically, to summarise: I don't know how to do rapid iterative development for a collection of Paketized projects publishing nuget components

forki commented 7 years ago
  1. Use git dependencies in paket. @Krzysztof-Cieslak uses it to maintain ionide and its deps.
isaacabraham commented 7 years ago

What @forki said was going to be my idea. It does need a bit of playing around with dependencies and the builds will take a while, but you can then create local nuget packages of downstream projects and have them naturally build and flow upwards to higher-level packages.

For the work I've been doing recently on MBrace.Azure I've been doing it really, really low-tech. I simply downloaded the latest mbrace azure and core. I run mbrace azure workers locally (with a pre-configured storage / service bus) and in a script copy across the mbrace.azure.dll output to where the script is working. This works fine for letting me debug MBrace.Azure code but you can't step into MBrace.Core stuff.

Setting up some paket git dependencies might be a much better way to go :-)

dsyme commented 7 years ago

@forki @isaacabraham I took a look at what this would mean in practice for these projects. It works OK when you have "Project2 --> Project1" dependencies. But it really doesn't work if there are diamond (DAG) dependencies involved.

Below is roughly what the paket.dependencies for MBrace.StarterKit would have to look like for the Vagabond --> MBrace.Core --> MBrace.Azure/MBrace.AWS --> MBrace.StarterKit dependency chain. The problem is that the from-the-repo versions of MBrace.Azure/AWS don't pick up the from-the-repo version of Vagabond and MBrace.Core. This is sort of a mess - if all projects are not locked to the same commit IDs for Vagabond then there will be inconsistencies.

Also, if the technique is enabled "by default" for some repos (which is what Ionide seems to do) , then you get a nesting effect where even Proj3 --> Proj2 --> Proj1 dependencies mean that acquiring Proj3 builds Proj2 which build (in a nested paket-file directory) Proj1. This loses all sharing and risks creating inconsistent versions of packages across the DAG.

paket.local may still allow a solution to this pointing to user-checked-out repos and building them, to keep the graph of dependencies, I need to consider

git https://github.com/mbraceproject/Vagabond.git master build: "build.cmd NuGet", Packages: /bin/, OS: windows
git https://github.com/mbraceproject/Vagabond.git master build: "build.sh NuGet", Packages: /bin/, OS: mono

git https://github.com/mbraceproject/MBrace.Core.git master build: "build.cmd NuGet", Packages: /bin/, OS: windows
git https://github.com/mbraceproject/MBrace.Core.git master build: "build.sh NuGet", Packages: /bin/, OS: mono

git https://github.com/mbraceproject/MBrace.Azure.git master build: "build.cmd NuGet", Packages: /bin/, OS: windows
git https://github.com/mbraceproject/MBrace.Azure.git master build: "build.sh NuGet", Packages: /bin/, OS: mono

git https://github.com/mbraceproject/MBrace.AWS.git master build: "build.cmd NuGet", Packages: /bin/, OS: windows
git https://github.com/mbraceproject/MBrace.AWS.git master build: "build.sh NuGet", Packages: /bin/, OS: mono

nuget Vagabond prerelease
nuget MBrace.Core prerelease
nuget MBrace.Flow prerelease
nuget MBrace.Thespian prerelease
nuget MBrace.Runtime prerelease
nuget MBrace.Azure prerelease
nuget MBrace.Azure.Management prerelease
nuget MBrace.AWS prerelease
nuget MBrace.CSharp prerelease
forki commented 7 years ago

Diamond? Ouch ;-)

Am 09.06.2017 20:54 schrieb "Don Syme" notifications@github.com:

@forki https://github.com/forki @isaacabraham https://github.com/isaacabraham I took a look at what this would mean in practice for these projects. It works OK when you have "Project2 --> Project1" dependencies. But it really doesn't work if there are diamond (DAG) dependencies involved.

Below is roughly what the paket.dependencies for MBrace.StarterKit would have to look like for the Vagabond --> MBrace.Core --> MBrace.Azure/MBrace.AWS --> MBrace.StarterKit dependency chain. The problem is that the from-the-repo versions of MBrace.Azure/AWS don't pick up the from-the-repo version of Vagabond and MBrace.Core. This is sort of a mess - if all projects are not locked to the same commit IDs for Vagabond then there will be inconsistencies.

Also, if the technique is enabled "by default" for some repos (which is what Ionide seems to do) , then you get a nesting effect where even Proj3 --> Proj2 --> Proj1 dependencies mean that acquiring Proj3 builds Proj2 which build (in a nested paket-file directory) Proj1. This loses all sharing and risks creating inconsistent versions of packages across the DAG.

paket.local may still allow a solution to this pointing to user-checked-out repos and building them, to keep the graph of dependencies, I need to consider

git https://github.com/mbraceproject/Vagabond.git master build: "build.cmd NuGet", Packages: /bin/, OS: windows git https://github.com/mbraceproject/Vagabond.git master build: "build.sh NuGet", Packages: /bin/, OS: mono

git https://github.com/mbraceproject/MBrace.Core.git master build: "build.cmd NuGet", Packages: /bin/, OS: windows git https://github.com/mbraceproject/MBrace.Core.git master build: "build.sh NuGet", Packages: /bin/, OS: mono

git https://github.com/mbraceproject/MBrace.Azure.git master build: "build.cmd NuGet", Packages: /bin/, OS: windows git https://github.com/mbraceproject/MBrace.Azure.git master build: "build.sh NuGet", Packages: /bin/, OS: mono

git https://github.com/mbraceproject/MBrace.AWS.git master build: "build.cmd NuGet", Packages: /bin/, OS: windows git https://github.com/mbraceproject/MBrace.AWS.git master build: "build.sh NuGet", Packages: /bin/, OS: mono

nuget Vagabond prerelease nuget MBrace.Core prerelease nuget MBrace.Flow prerelease nuget MBrace.Thespian prerelease nuget MBrace.Runtime prerelease nuget MBrace.Azure prerelease nuget MBrace.Azure.Management prerelease nuget MBrace.AWS prerelease nuget MBrace.CSharp prerelease

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mbraceproject/MBrace.StarterKit/issues/95#issuecomment-307471367, or mute the thread https://github.com/notifications/unsubscribe-auth/AADgNPExAPGs5wyKbWPJhjmTP5Dxz1eHks5sCZTNgaJpZM4N1jot .

forki commented 7 years ago

Probably worse since you can build cycles...

Am 09.06.2017 20:55 schrieb "Steffen Forkmann" sforkmann@gmail.com:

Diamond? Ouch ;-)

Am 09.06.2017 20:54 schrieb "Don Syme" notifications@github.com:

@forki https://github.com/forki @isaacabraham https://github.com/isaacabraham I took a look at what this would mean in practice for these projects. It works OK when you have "Project2 --> Project1" dependencies. But it really doesn't work if there are diamond (DAG) dependencies involved.

Below is roughly what the paket.dependencies for MBrace.StarterKit would have to look like for the Vagabond --> MBrace.Core --> MBrace.Azure/MBrace.AWS --> MBrace.StarterKit dependency chain. The problem is that the from-the-repo versions of MBrace.Azure/AWS don't pick up the from-the-repo version of Vagabond and MBrace.Core. This is sort of a mess - if all projects are not locked to the same commit IDs for Vagabond then there will be inconsistencies.

Also, if the technique is enabled "by default" for some repos (which is what Ionide seems to do) , then you get a nesting effect where even Proj3 --> Proj2 --> Proj1 dependencies mean that acquiring Proj3 builds Proj2 which build (in a nested paket-file directory) Proj1. This loses all sharing and risks creating inconsistent versions of packages across the DAG.

paket.local may still allow a solution to this pointing to user-checked-out repos and building them, to keep the graph of dependencies, I need to consider

git https://github.com/mbraceproject/Vagabond.git master build: "build.cmd NuGet", Packages: /bin/, OS: windows git https://github.com/mbraceproject/Vagabond.git master build: "build.sh NuGet", Packages: /bin/, OS: mono

git https://github.com/mbraceproject/MBrace.Core.git master build: "build.cmd NuGet", Packages: /bin/, OS: windows git https://github.com/mbraceproject/MBrace.Core.git master build: "build.sh NuGet", Packages: /bin/, OS: mono

git https://github.com/mbraceproject/MBrace.Azure.git master build: "build.cmd NuGet", Packages: /bin/, OS: windows git https://github.com/mbraceproject/MBrace.Azure.git master build: "build.sh NuGet", Packages: /bin/, OS: mono

git https://github.com/mbraceproject/MBrace.AWS.git master build: "build.cmd NuGet", Packages: /bin/, OS: windows git https://github.com/mbraceproject/MBrace.AWS.git master build: "build.sh NuGet", Packages: /bin/, OS: mono

nuget Vagabond prerelease nuget MBrace.Core prerelease nuget MBrace.Flow prerelease nuget MBrace.Thespian prerelease nuget MBrace.Runtime prerelease nuget MBrace.Azure prerelease nuget MBrace.Azure.Management prerelease nuget MBrace.AWS prerelease nuget MBrace.CSharp prerelease

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mbraceproject/MBrace.StarterKit/issues/95#issuecomment-307471367, or mute the thread https://github.com/notifications/unsubscribe-auth/AADgNPExAPGs5wyKbWPJhjmTP5Dxz1eHks5sCZTNgaJpZM4N1jot .

Krzysztof-Cieslak commented 7 years ago

Please also remember that Windows has path length limitation and if you do ProjectA --> ProjectB --> ProjectC all using git deps you may hit this problem... speaking from experience ;)

dsyme commented 7 years ago

Very simple diamond like this::

StarterKit 
    --> MBrace.Azure 
        --> MBrace.Core

StarterKit 
    --> MBrace.AWS
        --> MBrace.Core
dsyme commented 7 years ago

@Krzysztof-Cieslak Yes, I figured it would hit that

dsyme commented 7 years ago

@isaacabraham

I am wondering if it is a mistake to have Thespian, Vagabond, MBrace.Core, MBrace.Azure, MBrace.AWS and MBrace.StarterKit all in separate projects, and all available as separate public nuget packages.

It feels like doing engineering work on MBrace for a core set of contributors might be 100x more efficient if these were all in one solution and some of these intermediate packages internalized as mere DLLs.

Logically speaking the component split is good, it's just that the engineering is inefficient.

Vagabond is certainly a useful as a separate component. But for Thespian and MBrace.Core I am sceptical - those may as well just be in-solution DLLs that are ultimately part of the three end packages - MBrace.Azure, MBrace.AWS and MBrace.Thespian.

It just feels like any simple change to any core component (such as updating a package used) is really painful to propagate. I've been trying to move the stack to FsPickler 3.1.0 and it is painful and fiddly (as well as hitting actual bug while doing this which I still haven't resolved).

(I hadn't quite realised what a huge tax it still places on engineering of a contained set of components if some of those components are made into first-class packages. Paket is excellent at tracking the dependencies - and it's hard to make actual mistakes as such, but it's just still relatively expensive to propagate a change through a stack of components)

isaacabraham commented 7 years ago

@dsyme yeah, it is a pain. For "slowly changing" packages it might not be as much of as issue? I'm also wary of the fact that we essentially bundle client and server as single packages - if we were to push all those five or six packages into one, wouldn't that greatly increase the size of the client download (as well as make updates much more regular)?

Just thinking out loud here but I definitely agree that pulling Core and e.g. Thespian / Azure / AWS has several clear benefits - what are the costs of this though?

dsyme commented 7 years ago

if we were to push all those five or six packages into one, wouldn't that greatly increase the size of the client download (as well as make updates much more regular)?

We would still have multiple nuget pckages, just one solution (and build.fsx) that builds and tests them all and pushes them all to nuget.

The are downsides to this too - e.g. if we ever have more cloud fabric bindings.