mbraceproject / MBrace.Core

MBrace Core Libraries & Runtime Foundations
http://mbrace.io/
Apache License 2.0
209 stars 46 forks source link

MBrace Thespian problem #170

Closed dsyme closed 8 years ago

dsyme commented 8 years ago

Mathias Brandewinder was giving a demo at .NET Fringe yesterday and #load "LocalWorker.fsx" failed to start the MBrace Thespian local worker cluster. I'm not sure where the problem is, the stacktrace is below. The code ZIP used is accessible via https://github.com/evelinag/dotnetfringe2016, the problem was with "Part4.fsx".

#load "LocalCluster.fsx"

open MBrace
open MBrace.Core
open MBrace.Flow
open Config

let cluster = Config.GetCluster()

image

The versions used are in the paket.lock below

FRAMEWORK: NET45
NUGET
  remote: https://www.nuget.org/api/v2
    Argu (2.1)
    DiffSharp (0.6.3)
      FsAlg (>= 0.5.8)
      FSharp.Quotations.Evaluator (>= 1.0.6)
    FsAlg (0.5.13)
    FSharp.Core (4.0.0.1)
    FSharp.Data (2.3.1)
    FSharp.Quotations.Evaluator (1.0.7)
    FsPickler (2.1)
    FsPickler.Json (2.1)
      FsPickler (2.1)
      Newtonsoft.Json (>= 6.0.5)
    Google.DataTable.Net.Wrapper (3.1.2)
    MathNet.Numerics (3.12)
    MathNet.Numerics.FSharp (3.12)
      FSharp.Core (>= 3.1.2.5)
      MathNet.Numerics (3.12)
    MBrace.Core (1.2.6)
    MBrace.Flow (1.2.6)
      FSharp.Core (>= 3.0)
      MBrace.Core (1.2.6)
      Streams (>= 0.4 < 0.5)
    MBrace.Runtime (1.2.6)
      FsPickler (>= 2.1 < 2.2)
      FsPickler.Json (>= 2.1 < 2.2)
      MBrace.Core (1.2.6)
      Vagabond (>= 0.13 < 0.14)
    MBrace.Thespian (1.2.6)
      Argu (>= 2.0 < 3.0)
      FsPickler (>= 2.1 < 2.2)
      MBrace.Core (1.2.6)
      MBrace.Runtime (1.2.6)
      Thespian (>= 0.1.11-alpha < 0.2)
      Vagabond (>= 0.13 < 0.14)
    Mono.Cecil (0.9.6.1)
    Newtonsoft.Json (9.0.1)
    Streams (0.4.1)
    Suave (1.1.3)
      FSharp.Core (>= 3.1.2.5)
    Thespian (0.1.11-alpha)
      FsPickler (>= 2.1 < 3.0)
    Vagabond (0.13)
      FsPickler (>= 2.1 < 3.0)
      Mono.Cecil (>= 0.9.6.1 < 0.9.7)
    XPlot.GoogleCharts (1.3.1)
      Google.DataTable.Net.Wrapper
      Newtonsoft.Json
smoothdeveloper commented 8 years ago

I'm thinking accessing the performance counters might require some authorization, is it possible to try again running the script (or VS itself) as administrator?

eiriktsarpalis commented 8 years ago

Running the code found in Day1.sln, I couldn't reproduce the issue. What version of Windows are you using? Also, does the same issue appear when calling PerformanceCounterCategory.Exists from inside fsi?

dsyme commented 8 years ago

Windows 8.1. Calling System.Diagnostics.PerformanceCounterCategory.Exists("a") gives a failure when running admin fsi.exe on my laptop.

C:\GitHub\dsyme\dotnetfringe2016\day-1\day-1>fsi

Microsoft (R) F# Interactive version 14.0.23413.0
Copyright (c) Microsoft Corporation. All Rights Reserved.

For help type #help;;

> System.Diagnostics.PerformanceCounterCategory.Exists("a");;
System.InvalidOperationException: Cannot load Counter Name data because an inval
id index '' was read from the registry.
   at System.Diagnostics.PerformanceCounterLib.GetStringTable(Boolean isHelp)
   at System.Diagnostics.PerformanceCounterLib.get_NameTable()
   at System.Diagnostics.PerformanceCounterLib.get_CategoryTable()
   at System.Diagnostics.PerformanceCounterLib.CategoryExists(String machine, St
ring category)
   at System.Diagnostics.PerformanceCounterCategory.Exists(String categoryName,
String machineName)
   at System.Diagnostics.PerformanceCounterCategory.Exists(String categoryName)
   at <StartupCode$FSI_0002>.$FSI_0002.main@()
Stopped due to error
eiriktsarpalis commented 8 years ago

Does this suggestion fix the problem?

palladin commented 8 years ago

@dsyme @eiriktsarpalis I can't reproduce the issue either... my favorite solution for these kind of problems try exists key with _ -> false :)

dsyme commented 8 years ago
>lodctr /r

Error: Unable to rebuild performance counter setting from system backup store, e
rror code is 2
dsyme commented 8 years ago

But yes, it fixes the problem

> System.Diagnostics.PerformanceCounterCategory.Exists("a");;
val it : bool = false
> #q;;

When you get a chance could you push a release with a try-catch around that call please? Thanks!

eiriktsarpalis commented 8 years ago

I'm not sure whether it's a good idea to swallow this error; it would lead to nodes inexplicably not reporting any diagnostic data which might be much more undesirable, particularly since we have a good workaround available.

The issue is not particularly tied to Thespian, it affects all mbrace clusters built on top of MBrace.Runtime, including Azure and AWS.

dsyme commented 8 years ago

Oh. Could we somehow optimistically assume a "false" answer from "Exists" - with a warning message of some kind - or is all performance counter stuff busted ?

smoothdeveloper commented 8 years ago

@eriktsarpalis / @dsyme I didn't understand swallow the error, maybe throw another exception saying

try running 'lodctr /r', this exception occured: inner exception message

dsyme commented 8 years ago

The problem is that the error occurs in the console output of the startup path of the workers, which makes it very hard to debug.

If there's a path to maintain functionality when this problem happens I'd say we should use it, printing information to the log but continuing to process requests.

eiriktsarpalis commented 8 years ago

Fixed in MBrace.Core 1.3

dsyme commented 8 years ago

@eiriktsarpalis Thanks!