Closed nicolasmelo1 closed 4 months ago
Thank you so much for the detailed report Nicolas! We dug into this and what we discovered is there are actually two bugs that worked together to make a bad situation worse :D
The first bug (and the quickfix) was a bug in our quickstart code, specifically in the worker.
app.post('/', (req: Request, res: Response) => {
console.log("Task received", req.body);
try {
process(req.body).catch(error => console.error("Error while processing task", error));
res.status(200);
} catch (e) {
console.error("Error", e);
res.status(500);
}
});
The problem is res.status
does not actually complete the http request, we're missing send
. If you change the code to the following, your example should work 🤞
app.post('/', (req: Request, res: Response) => {
console.log("Task received", req.body);
try {
process(req.body).catch(error => console.error("Error while processing task", error));
res.status(200).send();
} catch (e) {
console.error("Error", e);
res.status(500).send();
}
});
The second bug was in our server, we forgot to set a timeout for these http requests. If no timeout is specified golang defaults to waiting for forever. Furthermore, by default our server has only a single worker to handle these http requests (this is configurable) and if that worker is waiting forever, then no subsequent tasks will be sent to workers (please note that other parts of our server were unaffected). This waiting is the reason the server would not shutdown. We will remedy this by adding a timeout, thanks for helping us find this!
Expected Behavior
1 - Start worker, start resonate and start server 2 - If there are pending promises the resonate server calls the worker. 3 - The workers calls the resonate server for claiming the task, this alone should just work.
Actual Behavior
1 - Start worker, start resonate and start server 2 - If there are pending promises the resonate server calls the worker. 3 - The workers calls the resonate server for claiming the task. 4 - The fetch request for claiming the task fails and everything gets frozen.
To Reproduce
Create src folder, add both an
index.ts
and aworker.ts
/**
@returns A promise that resolves to a summary of the downloaded page's text. */ export async function downloadAndSummarize(context: Context, url: string) { // Summarize the content on a node with a gpu console.log("url", url); let summary = await context.run(
/gpu/summarize/summarize-${url}
, url);// Return the summary of the content return summary; }
// Initialize a Resonate application. const resonate = new Resonate({ url: "http://localhost:8001" });
// Register a function as a Resonate function resonate.register( "downloadAndSummarize", downloadAndSummarize, resonate.options({ timeout: Number.MAX_SAFE_INTEGER }) );
// Start the Resonate application resonate.start();
// Initialize an Express application. const app = express().use(express.json());
// Register a function as an Express endpoint app.post("/summarize", async (req: Request, res: Response) => { const url = req.body?.url; try { // Call the resonate function let summary = await resonate.run( "downloadAndSummarize", / id /
summarize-${url}
, / param / url ); res.send(summary); } catch (e) { res.status(500).send("An error occurred."); } });// Start the Express application app.listen(3000, () => { console.log("Listening on port 3000"); });
Add this to your
package.json
:Run each command on a distinct terminal instance or use something like concurrenty:
Ctrl ^ C
on thepnpm run dev
terminal instance, fully stop it, let resonate run and the worker run. Change both lines onindex.ts
:and
Rerun
pnpm run dev
.Curl the API again with
Repeat process 6, 7 and 8. Change where it's
summarize1
tosummarize2
, ....summarizeN
. You'll see that eventually it gets fully frozen, it doesn't send to the worker. So it piles up.Now that it never sends to the worker, fully stop the resonate server, and resonate worker.
Run them again. First the worker, then the server.
You'll get the following on the worker console.
And on the resonate server you shall receive:
Ctrl + C
it logs:but never really stops the process.
resonate.yml contents
Specifications
Additional context