Open SiavashBabaei opened 6 years ago
The issue seems to be with ParStream
and tuples (fst
and snd
functions). If I change the code to:
let newCentroids =
clusterParts
|> Array.concat
|> ParStream.ofArray
|> ParStream.groupBy fst
|> ParStream.toArray
|> Array.sortBy fst
|> ParStream.ofArray
|> ParStream.map snd
|> ParStream.map (fun clp -> clp |> Seq.map snd |> Seq.toArray |> Array.unzip)
|> ParStream.map (fun (ns,points) -> Array.sum ns, sumPoints points dim)
|> ParStream.map (fun (n, sum) -> divPoint sum (float n))
|> ParStream.toArray
Then it works OK!
Rechecking on much simpler code:
let temp2 =
cloud {
let res =
[| (1, 3); (2, 2); (3, 1) |]
|> ParStream.ofArray
|> ParStream.sortBy fst
|> ParStream.toArray
return res
} |> cluster.Run
gives val it : (int * seq<int * int>) [] = [|null; null; null|]
rather than val it : (int * int) [] = [|(1, 3); (2, 2); (3, 1)|]
--> Seems it cannot handle tuples at all. Moreover, the following snippet:
cloud {
let res =
[| 1; 2; 3 |]
|> ParStream.ofArray
|> ParStream.sortBy ( fun it -> it % 3 )
|> ParStream.toArray
return res
} |> cluster.Run
gives val it : int [] = [|0; 0; 0|]
rather than the expected val it : int [] = [|3; 1; 2|]
!!!
Executing the "Example: Running an iterative algorithm at scale with incremental notifications" at http://mbrace.io/starterkit/HandsOnTutorial.FSharp/examples/200-kmeans-clustering-example.html:
yields: