secretflow / secretpad

SecretPad is a privacy-preserving computing web platform based on the Kuscia framework, designed to provide easy access to privacy-preserving data intelligence and machine learning functions.
https://www.secretflow.org.cn
Apache License 2.0
38 stars 25 forks source link

secretpad 训练流状态显示不一致 #112

Open john8628 opened 1 month ago

john8628 commented 1 month ago

Issue Type

Feature

Have you searched for existing issues?

Yes

Link to Relevant Documentation

No response

Question Details

两个建立通讯的节点,在pad查看同一个训练流,两边状态不一致;是什么原因导致;
但是不影响在单边的配置,执行任务;通一个任务日志,两边进入容器后,可以查看任务的日志;
(做过sqlite=>mysql)
aokaokd commented 1 month ago

看下kuscia容器中kuscia log日志,是否有error

john8628 commented 1 month ago

看下kuscia容器中kuscia log日志,是否有error

执行的任务确实失败过;但是页面展示的问题已经怎么解决啊?

zimu-yuxi commented 1 month ago

看下kuscia容器中kuscia log日志,是否有error

执行的任务确实失败过;但是页面展示的问题已经怎么解决啊?

页面组件是未运行,还是一直在运行中。方便给一下截图吗?

john8628 commented 1 month ago

image image

aokaokd commented 1 month ago

你好,你使用的是p2p的部署模式吗

john8628 commented 1 month ago

你好,你使用的是p2p的部署模式吗

是的

aokaokd commented 1 month ago

你重新发起一个任务,看下secretpad侧的日志中有没有error

john8628 commented 1 month ago

你重新发起一个任务,看下secretpad侧的日志中有没有error

有个报错; 14:11:49 [http-nio-8080-exec-7] ERROR o.s.s.w.e.SecretpadExceptionHandler - find SecretpadException error: AUTH_FAILED, message: The request header does not contain header! org.secretflow.secretpad.common.exception.SecretpadException: The request header does not contain header! at org.secretflow.secretpad.common.exception.SecretpadException.of(SecretpadException.java:58) at org.secretflow.secretpad.web.util.AuthUtils.findTokenInHeader(AuthUtils.java:43) at org.secretflow.secretpad.web.interceptor.LoginInterceptor.processByUserRequest(LoginInterceptor.java:187) at org.secretflow.secretpad.web.interceptor.LoginInterceptor.preHandle(LoginInterceptor.java:149) at org.springframework.web.servlet.HandlerExecutionChain.applyPreHandle(HandlerExecutionChain.java:146) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1076) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:974) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1014) at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:914) at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:590) at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:885) at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:658) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150) at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:175) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150) at org.secretflow.secretpad.web.filter.AddResponseHeaderFilter.doFilterInternal(AddResponseHeaderFilter.java:61) at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:175) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150) at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100) at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:175) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150) at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93) at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:175) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150) at org.springframework.web.filter.ServerHttpObservationFilter.doFilterInternal(ServerHttpObservationFilter.java:109) at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:175) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150) at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201) at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:175) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:150) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:167) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:90) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:482) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:115) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:93) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74) at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:673) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:344) at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:391) at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:63) at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:896) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1736) at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52) at org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191) at org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659) at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:63) at java.base/java.lang.Thread.run(Thread.java:840)

aokaokd commented 1 month ago

鉴权失败的error,不是这个导致的,后面应该收敛了吧

aokaokd commented 1 month ago

你看下你的kuscia log日志,不是任务日志

john8628 commented 1 month ago

有类似的报错 image

aokaokd commented 1 month ago

这个是因为rpc连接被关闭导致的。看上去是这个原因。这里的源码你有修改过吗。检查你的代码逻辑

john8628 commented 1 month ago

这个是因为rpc连接被关闭导致的。看上去是这个原因。这里的源码你有修改过吗。检查你的代码逻辑

没有改过rpc的核心代码;改造了mysql的存储;

aokaokd commented 1 month ago

再跑一下任务,控制台会轮询请求node/status,看看请求里面有没有error

john8628 commented 1 month ago

再跑一下任务,控制台会轮询请求node/status,看看请求里面有没有error

还是kusia.log,看不出来什么问题,问题方便加个dingding吗;

john8628 commented 1 month ago

已经按照建议;大概率是网络抖动造成的数据同步问题;已经去掉了网络代理nginx的相关配置;持续观察中