muhamadazmy commented 2 years ago

http workers module, to initialize you need to pass storage, number of workers

spawn http workers routines
worker will pull a job from the main workers routing
once a worker is asking for a job, the main routine will pull (process()) from storage.

As a worker, once it receives a job it can process it according to specs

based on the source queue it can either call /rmb-remote or /rmb-reply

muhamadazmy commented 2 years ago

Specifications

The idea behind the workers is that each Worker receives a specific task to execute. It does it sequentially in it's own scope and it's not aware of operation of other worker siblings. In other words the Worker execute it's function not taking parallelism into account (well accept for using async/await operations when applicable). It's up to the work distributor to distribute the tasks to workers hence parallelism is archinved.

Generic simple worker pool

A worker pool is defined only by Operation. And Input. (this currently in code translate to Work and Job) where a Work is the code that process a Job (input). A Job is just a piece of data that Work can do.

The Work (or the Operation) knows how to handle the Job, and also can hold any other objects it needs to help it (for example an http client or a reference to Storage).

Note: since this previous part is already done in code. I will not give more details or hints on the implementation of the work pools.

Work done by HttpWorker

This part is still under development, I will try to clarify here the operation of sending a single message.

The worker job is to send a Message to a remote RMB. either by posting it to either /rmb-remote or /rmb-reply end points. Depends on the type of the message. A message can be either a request, or a response. In case of forwarding a request to a remote RMB the endpoint /rmb-remote is called. On reply an /rmb-reply is called.

The worker start by receiving a message. A single message can consist of multiple destination, each of them is a twin id. the message also has data part which is base64 string. As far as the worker is concerned this is a plain-text message sent from the caller.

Steps

Get the twin object from the twin-db.
- May be it's better if the TwinDB actually differentiate between a Twin NotFound, and an Error. In case of an Error we can try again to get the twin few times, then fail if the error was not resolved (this can be just temporary connection error). A NotFound twin is a terminal cause, and no more retrying is done any more on that twin id.
- The data on the message is encrypted with that twin public key
- The data is then set again as the base64 of the encrypted data ? (unfortunately this now will be base64(encrypt(base64(data))) but that is needed to be backward compatible with older clients that expect the data to base base64.
- The timestamp on the message is set to know
- The message is signed. the signature on the message must include all static message fields (id, src, dst, cmd, ret, etc...) this will make sure on the receiver side, the twin can verify that the message was not manipulated.
- The message is posted to remote twin.
- If a valid HTTP response is returned (even if this response code is an error) the worker assumes success of delivery and return.
- If the delivery failed duo to connection error, or timeout. That is considered a failure to deliver and the delivery is retried max of msg.retry times.

Handling of errors of delivery

If failure to resolve a twin or failure to delivery is encountered then message deliver (for that specific twin) has failed. There is 2 branches on how to handle this error.

If this is a Forward message. it means we are the sender of the request. Hence the process that initiated this call lives on the same local machine. Hence we can simply push a response back to this caller and tell him we failed. This is can be done by creating a response Message with all valid fields (specially the ret queue) and then call storage.reply() method. This will basically push your response to the caller and hence it will know about this error. The Message need src, dst, and err fields set with empty data.

If this is a Reply message, nothing can be done and a warning is logged.

AhmedYasen commented 2 years ago

May be it's better if the TwinDB actually differentiate between a Twin NotFound, and an Error. In case of an Error we can try again to get the twin few times, then fail if the error was not resolved (this can be just temporary connection error). A NotFound twin is a terminal cause, and no more retrying is done any more on that twin id.

We said before that always will there a twin, so the NotFound case will not happen. so that the function get return Result<Twin> not Result<Option<Twin>>

muhamadazmy commented 2 years ago

Yes, i remember. We decided to return a NotFound as an error. But to make your life easier may be we should still return Option so u can easily know if you need to try again or not.

Hence may be it's a good idea to change the TwinDB.get to return Result<Option> this way if it fails (error) u know u can try again. but if it returns a None it means okay, i am not trying again.

threefoldtech / rmb-rs

http workers #15

Specifications

Generic simple worker pool

Work done by HttpWorker

Steps

Handling of errors of delivery