Open zguesmi opened 3 years ago
@zguesmi Your log implies we have a 2-uploaders race condition here: https://github.com/iExecBlockchainComputing/iexec-result-proxy/blob/f878d5e278d4b006344ead37dc11b64c970ca931/src/main/java/com/iexec/resultproxy/proxy/ProxyService.java#L57
It might be yes, it should be correctly handled. That might also mean we have fixes to add to the scheduler.
This issue seems to arise when two or more workers try to upload their result for the same task before the result proxy is fully initialized. All the requests are put in a queue and then executed simultaneously. E.g:
2021-09-28 12:02:03.309 INFO 1 --- [io-13200-exec-3] o.s.web.servlet.DispatcherServlet : Completed initialization in 5 ms
2021-09-28 12:02:03.478 INFO 1 --- [io-13200-exec-2] org.mongodb.driver.connection : Opened connection [connectionId{localValue:3, serverValue:6}] to result-proxy-mongo:13202
2021-09-28 12:02:03.478 INFO 1 --- [io-13200-exec-4] org.mongodb.driver.connection : Opened connection [connectionId{localValue:2, serverValue:5}] to result-proxy-mongo:13202
2021-09-28 12:02:03.723 INFO 1 --- [io-13200-exec-5] c.i.resultproxy.proxy.ProxyController : Result uploaded successfully [chainTaskId:0x71d953138e0293b77eeae25b5114cbcc5df53ef9d99c3a153fcd9fe615668883, uploadRequester:0x2ab2674aa374fe6415d11f0a8fcbd8027fc1e6a9, resultLink:/ipfs/QmPbZfnkdoxt4CEmRS2jq7ms8LF9GQuSgBm5NNBsBXNbnh]
2021-09-28 12:02:03.723 INFO 1 --- [io-13200-exec-6] c.i.resultproxy.proxy.ProxyController : Result uploaded successfully [chainTaskId:0x71d953138e0293b77eeae25b5114cbcc5df53ef9d99c3a153fcd9fe615668883, uploadRequester:0x1a69b2eb604db8eba185df03ea4f5288dcbbd248, resultLink:/ipfs/QmbHadrtdZ5cz9bJTaVRLUnKAWm9442sC1vE1Pg8xMtRAV]
2021-09-28 12:02:03.757 ERROR 1 --- [io-13200-exec-8] o.a.c.c.C.[.[.[/].[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is org.springframework.dao.IncorrectResultSizeDataAccessException: Query { "$java" : Query: { "taskId" : "0x71d953138e0293b77eeae25b5114cbcc5df53ef9d99c3a153fcd9fe615668883"}, Fields: {}, Sort: {} } returned non unique result.] with root cause
There's probably something to fix here to make it more robust but the scenario where 2 workers try to upload their result for the same task should not happen.
In integration tests, the worker fails with this error when it tries to push the result:
This exception appears in the result proxy logs:
How to fix:
Add
@Unique
to the fieldtaskId
of the modelIpfsName
here.