Open xmonader opened 2 months ago
also current mentor services logs the aliveness of the services but do not return any values,i think it need some refactor adding new methods
WIP: creating the logic to pick the first available stack per service add this init class to the alivenessCheck
in monitoring
export class serviceStackPicker {
private result: ServicesUrls = {};
constructor(public options: StackPickerOptions) {}
private async pingService(service: ILivenessChecker) {
const status = await service.isAlive();
if ("disconnect" in service) {
await (service as IDisconnectHandler).disconnect();
}
return status;
}
async GetAvailableServices(): Promise<ServicesUrls> {
if ("tfChain" in this.options)
this.result.tfChain = await this.getAvailableServiceStack(
this.options.tfChain.slice(1),
new TFChainMonitor(this.options.tfChain[0]),
);
if ("gridProxy" in this.options)
this.result.GirdProxy = await this.getAvailableServiceStack(
this.options.GirdProxy.slice(1),
new GridProxyMonitor(this.options.GirdProxy[0]),
);
return this.result;
}
async getAvailableServiceStack(urls: string[], service: ILivenessChecker) {
let index = 0;
do {
const status = await this.pingService(service);
if (status.alive) return service.serviceUrl();
console.log(`${service.serviceName()}: failed to ping ${service.serviceUrl()}, due to ${status.error}`);
service.setServiceURl(urls[index++]);
} while (index < urls.length);
}
}
and should be called anywhere with
const test = new serviceStackPicker({
tfChain: ["faf", "wss://tfchain.dev.grid.tf/ws", "wsss://tfchain.dev.grid.tf/ws"],
GirdProxy: ["hahah", "https://gridproxy.dev.grid.tf"],
});
console.log(await test.GetAvailableServices());
also we had to add setServicesUrl
method to IServiceBase
and add some changes to update the effected props
this snips is the initial phase of coding, feel free to suggest any refactors
Issue Update: monitoring part is almost ready, will support accessing the pick function directly and make the #3105 ready for review.
Issue update: was investigating how to read and pars array form user, but i can't reach anything;
WIP: creating a script to read from user the stacks for each service and then will convert it to string and export it current behavior
it shouldn't be interactive. you can export them with comma-separate then split them in the script
it shouldn't be interactive. you can export them with comma-separate then split them in the script
done export target stacks separated with comma will work fine
Issue update: https://github.com/threefoldtech/tfgrid-sdk-ts/pull/3105 got some changes:
Issue update:
WIP: finding a way to integrate this logic with playground
Issue update: applied pr comments and introduce new features
New monitoring pr is ready : https://github.com/threefoldtech/tfgrid-sdk-ts/pull/3134
All review comments applied on https://github.com/threefoldtech/tfgrid-sdk-ts/pull/3134
Issue update: fixed some issues in monitroring while integrating it in UI,
those clients got loaded before initializing the envs and got undifined urls, suggest moving those clients to grid store
Issue Update: Monitoring integrated in playground Blocker: Not sure yet should we provide fake mnemonic for RMB monitor or what
all requested changes applied, and ui is ready as well.
As we discussed with @sameh-farouk, we can use chain /health
endpoint to verify node rpc status,
if it responds with 200 OK, then it is alive and we can rely on it
Issue update
Blocker while pinging all urls in parallel, by passing url to the alive method, all urls give an error even if one of them is reachable work on testing branch
Issue update: I added the required fallback mechanism, but I'm facing a very wired issue, when we have more than one invalid url in the stack, the whole requested urls for all services got effected and gives Timeout error
but if we remove one of the invalid/unreachable urls from the array it works fine. https://github.com/threefoldtech/tfgrid-sdk-ts/blob/889190d255a1b5a762ca47a6c880e10f30e824a7/packages/monitoring/example/serviceURLManager.ts#L10
details of the new applied mechanism:
service.url
.isAliveWithretires
for all stacks in promise array
https://github.com/threefoldtech/tfgrid-sdk-ts/blob/889190d255a1b5a762ca47a6c880e10f30e824a7/packages/monitoring/src/serviceMonitor/serviceURLManager.ts#L128Promise.allSattled
and return the first fullfiled url in the promise array "first based the order of promises array"
https://github.com/threefoldtech/tfgrid-sdk-ts/blob/889190d255a1b5a762ca47a6c880e10f30e824a7/packages/monitoring/src/serviceMonitor/serviceURLManager.ts#L130-L131Concerns:
this approach takes a lot of time, as Promise.allSattled
waits for all stacks to resolve
so if i have 2 urls first one has a response on the first try, and the second is unreachable
i have to wait for the second one to exhausts all its retries, for our case, with base timeout 2 it will take about 12 seconds to response with the first url that already replied within the first 2 seconds
Issue update : resolved the issue of fetch by providing valid or existing urls work completed: support url monitor in gridclient, code needs some cleanup and test then the pr will be ready
I faced an issue while testing playground integration: getDefaultUrls
had braking changes as it used in playground and tests so will create new function that contain the new added logic
work completed:
Verified, Devnet.
The stack is working fine, checking all the URLs at once with 3 retries and picking the working one in the same order as they were written.
For one failing but the other working.
For all of them to fail.
For Grid client.
Created an issue for the Grid Client to work as config in playground: https://github.com/threefoldtech/tfgrid-sdk-ts/issues/3399.
New test Case: TC2825 - Run locally with multiple stack
Issue update : started to implement the logic but can't use the monitor package