mathworks / jenkins-matlab-plugin

This plugin enables you to run MATLAB® and Simulink® as part of your Jenkins™ build.
45 stars 54 forks source link

runMATLABCommand sometimes stuck. #322

Open Rvh91 opened 4 months ago

Rvh91 commented 4 months ago

I've noticed that sometimes a stage is 'stuck'. where I have executed a certain task in the buildtool, it completes but somehow jenkins does not continue, so probably Matlab isn't providing an exit code? I don't really know how to debug this. Not sure if it is relevant, but multiple builds can be running simultaneous on the same windows agent.

stage('Run repository test suite (unit & integration tests)') {
    steps {
        runMATLABCommand(command: 'buildtool testReport')
    }
}

I've noticed that on the builds where it does continue I can see the following:


Parallel pool using the 'Processes' profile is shutting down.

while on those where it seems stuck it only shows the

** Finished testReport
nbhoski commented 4 months ago

Hi @Rvh91 can you please check if the same issue is reproducible outside of the Jenkins ? meaning you could just run the same build tool on your batch CLI using matlab -batch command for eg

matlab -batch "buildtool testReport"

you could run above command on the same host where you see the job is getting stuck and see if its reproducible.

Rvh91 commented 4 months ago

Hi @nbhoski, This is a bit difficult to troubleshoot, as it does not happen every time. I have tried the command a couple of times from the command line now, but it doesn't happen now. That being said, it could also be related to only having one 'instance' running now? While when Jenkins is active, it frequently happens that we have multiple builds running simultaneously on the same machine. I'm not entirely sure how the Parallel pool works? is that shared between instances?

nbhoski commented 4 months ago

Hi @nbhoski, This is a bit difficult to troubleshoot, as it does not happen every time. I have tried the command a couple of times from the command line now, but it doesn't happen now. That being said, it could also be related to only having one 'instance' running now? While when Jenkins is active, it frequently happens that we have multiple builds running simultaneously on the same machine. I'm not entirely sure how the Parallel pool works? is that shared between instances?

One thing I could suggest you is check if your resources are not exosted. try increasing the worker threads on Jenkins and see if this persists.

Rvh91 commented 4 months ago

@nbhoski, do you mean the number of executors on our agent?

nbhoski commented 4 months ago

@nbhoski, do you mean the number of executors on our agent?

Yes

Rvh91 commented 4 months ago

I have currently 4 executors enabled on this agent, so I'm able to run 4 build simultaneously, which hardly ever happens. But I suspect that the parallel pool that is created when using a 'parfor' loop, might somehow be shared between builds? So it wont be able to shut down, as another instance is still using it? Is that possible?

nbhoski commented 4 months ago

could you share a similar example which uses PCT keywords like parfor so that it would be easy for me to reproduce it.

TylerWeir commented 4 months ago

Hello, I'm facing a similar issue where our Jenkins runMATLABBuild call was hanging on a specific commit running through our CI pipeline. This specific call to runMATLABBuild runs a buildtool task to build all models in our Simulink project matching a specific label. Note this issue was only affecting a single commit running through the CI pipeline.

As suggested above, I tried to reproduce the issue outside of Jenkins using matlab -batch buildtool [task_name] on the problem commit. Interestingly, at the end of the build output, but prior to Matlab exiting a prompt window is opened asking about saving a data dictionary before closing (undoubtedly there is an error in the build, but I'd expect the CI pipeline to be resilient nonetheless). I'm wondering if this prompt windows is the source of the hang when running in CI as well? Perhaps something similar is the cause of @Rvh91's problem? Note that Matlab exits only after closing the prompt.

Here is the prompt that is opened: batch-prompt-croped

Further, I tried running the same command with the -noFigureWindows option. While, the prompt didn't open, Matlab only printed warnings and never seemed to exit. It only exits once I kill it with ctrl-c.

Here are said warnings:

Error using buildtool
Build failed.

Warning: dialog is no longer supported when MATLAB is started with the -nodisplay or -noFigureWindows option or there is no display.
> In warnfiguredialog (line 15)
In dialog (line 41)
In questdlg (line 160)
Warning: uiwait is no longer supported when MATLAB is started with the -nodisplay or -noFigureWindows option or there is no display.
> In warnfiguredialog (line 15)
In uiwait (line 40)
In questdlg (line 413)

And here's the tail of the logs of the runMatlabBuild that hung in CI:

** Failed [task name]

** Closed project [our project name]

{Error using buildtool
Build failed.

Error in build_CtbW3xqx (line 2)
buildtool [task name]
} 

I hope these findings are useful in debugging this issue. Happy to help if possible.

nbhoski commented 4 months ago

@TylerWeir I too think the prompt window is the cause of hang. are you making any changes to the suggested file in the pop up window ? could you handle it ?

nbhoski commented 4 months ago

@TylerWeir could you confirm if the the issue was with pop up window and if you could resolve it?

TylerWeir commented 4 months ago

@TylerWeir could you confirm if the the issue was with pop up window and if you could resolve it?

Yes, fixing the Simulink Model to avoid the pop up window allowed our build to complete successfully and thus run through CI.

nbhoski commented 3 months ago

executors

Hi @Rvh91,

I was trying to reproduce the issue in local. to support you better could you please provide with following

This would help me reproduce the issue close to your environment.

Regards Nikhil

nbhoski commented 1 month ago

@Rvh91 could you please confirm if this issue is still relevent ? Please let me know if this is still reproducible.

Rvh91 commented 1 month ago

It is still an issue. We have moved to calling matlab from the commandline via: bat "matlab -batch -wait ${fullCommand} "

so without the plugin, however the problem persists. So it might not be directly due to the plugin itself. Also I cant reproduce it consistently, it appears on some builds, but not on others. I have yet to figure out what the root cause is here.