Open olitomlinson opened 5 years ago
@olitomlinson Have you considered https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-entities#accessing-entities ? Durable entities (azure durable functions v2.0) support signaling / 1way.
Also, this guy did some interesting stuff with replacing the SF communication libraries with the gRPC stack. I wonder if that could give you more horsepower? http/2 + gRPC likely scales better than whatever the SF guys are using.
UPDATE: sorry, I just re-read your "what have you tried" section and saw you looked at durable entities.
@loekd Thanks :)
Yes I have considered Akka, but previous experiences of Akka.NET have not been operationally pleasant so I can't go that route.
I don't think Orleans supports signalling/telling?
@oising Do you have a link to the work done using gRPC communication in SF? This would certainly be something I can experiment with. Thanks!
There is this - no clue how it performs though. https://github.com/mholo65/ServiceFabric.Grpc
Hm, on single done i did manage to get throughput of 1000msg/s + using rabbitmq as message bus, Combining stateles service as consumer and actors as processing logic.. on cluster you will get competing consumers and with good actor partitioning its much bigger throughput, but design and usage is not what you requare i think, can you share your test (create example) of that design
Is your feature request related to a problem? Please describe.
In my use-case, I have a scenario where approx 3 Actors (Lets call them Actors A,B,C) all need to pass a message to Actor D. Once Actor D has received and processed all messages, its work is complete.
I'm trying to design for a typical throughput of around 300 concurrent Actor D completing per second, with the occasional burst up to 1000 Actor D completing per second.
From my very high-level testing, the desired throughput has been impossible to achieve, I can't get anywhere near those figures. It starts lagging when my throughput goes beyond 30 per second. Moving to volatile actors doesn't really have an affect on throughput which implies that its not replication that is limiting throughput.
I know there could be many dials to tweak to improve this, such as scaling out partitions across more nodes in the cluster, and even scaling up to use bigger nodes. But this is where the cluster cost would be unjustifiably expensive.
In order to start squeezing more performance out of the Cluster, It would be advantageous to remove all the RPC calls between the actors, in favour of traditional 1-way message passing. I'm sure this is not the only solution the problem, but it would certainly remove latency to aid throughput. The actors are pretty light weight, they don't actually do much, just publish some events to EventGrid, alter a bit of actor state etc.
RPC and synchronous communication is well known to present scaling issues in Microservice architecture and I'm aware that Actor-to-Actor is considered an anti-pattern in SF and the guidance is don't design your system in this way, but the domain I work in strongly lends itself to the Actor pattern, so much so that we are looking to rewrite it entirely using Actors.
In my use-case the publishing actor, Actor A, B & C does not require a response from the receiving actor, Actor D. A one-way, at-least once messaging passing guarantee would be entirely sufficient and would allow me to closer attain the levels of scale I need.
Describe the solution you'd like
I would like the ability for a
Stateful Service
orActor
to signal anActor
in the traditional sense of actor signaling, with the hope that a lack of RPC would yield a greater throughput (due to lack of distributed transactions and network etc)Describe alternatives you've considered
I've considered facilitating communication between the Actors by using an external intermediary such as EventGrid/ServiceBus, but I'm not convinced this would yield greater throughput, and I'm not convinced this is the best design when looking at the domain as a whole.
I've also considered completely migrating this part of the domain to Azure PaaS, provided by Durable Entities (which supports traditional Actor Signaling). But I'm not convinced that running part of the domain in on-prem SF and part of the domain in Azure is the best solution. I prefer to keep all the domain together in one deployable & maintainable unit to ease the cognitive load (The unit would be Service Fabric Application)
Additional context
Would it make sense to use Service Fabrics built-in data platform to facilitate the movement of messages, thereby reducing the costly Network activity and locking between Actors?
Could the SoCreate PubSub tooling be used to underpin a 1st class developer experience, rather than using the data platform directly? https://service-fabric-pub-sub.socreate.it
Do you have any thoughts on this @LoekD