Open wallscheid opened 4 years ago
* Use a profiling tool (e.g. built-in in Pycharm) to investigate on the time consumption for the different toolbox parts for the provided examples or a dummy controller input * Identify possible potentials for improving the code in terms of processing time * Evaluate alternative implementations to speed up the code (e.g. on specific test branch)
The speed analysis has been done for various motor environments. The GEM related code bottlenecks are prioritized. The following steps were followed for the analysis:
A sanity check of code coverage while profiling each environment to substantiate covering the entire GEM part
Code profiling has been done using pycharm's Cprofile which gives both the statistical and graphical representation of profiled report
It is found that the majority of the time was utilized by an advanced non-gem frameworks like keras, tensorflow and libraries like scipy and others.
A rigorous investigation established that most of the time in GEM part was spent on following python files( kindly look at the attached report for function level details)
Improvement suggestions: At first sight, using python code optimization using cython, numba, pythron seems to fit the cause. Although majority of modules are already using libraries which has such optimization in-built. It would be interesting to see where this would lead us and to what extent can we optimise
Additionally, I have also noticed the earlier optimization efforts using Jacobian. But, need to dig deeper on that.
I had a call with Pramod in which he showed me the steps he did for the code investigations. I have no further remarks on this issue.
@pramodcm95 : During the last meeting you also showed the structure overview of the built-in Pycharm profiler regarding the computational demand per code part (flowchart style diagram). Could you please add these diagrams as picures to this report issue, I belive they provide a valuable overiew.
In the issue branch on git hub, I have checked in these files as "___.pstat' files (one for each environment) which needs to be imported to pycharm( in tools->open CProfile snapshot ). This should be the best way to get that graph since we can navigate to respective source file directly.
Here, In this issue, I can post the snapshot if you want. But are you expecting the flowchart snap of each environment and controller combination?, which I am afraid, may be chaotic here. Please let me know if the snapshot is required for all environments. I will do the needful
Since we would like to maintain a slim branch structure in the long-run, the issue branch will be deleted at some point. Hence, I believe it would be simplest if you could through the relevant files into an archive folder and upload it here to the issue thread - I believe these reportings are static files which do not require the specific code from which they have been generated, right?
yes, they have been generated for different Environment-ID's. SO changing an environment would give me a new report.
And I understand the concept now. I am uploading a folder with relevant files for a quick update now. The snapshots will also be posted soon
Code profiling delivered some interesting insights into the computational demand of the different parts of the toolbox.
Next steps will be to evaluate possible speed improvements e.g. if parts of the toolbox can be transfered to Cython or Numba to optain a speed-up.
@pramodcm95 I believe with the above zip archive the results of your analysis are saved within this issue thread also for later inspection & discussion. For repo clearness I would like to delete the branch attached to this issue - do you object?
No @wallscheid . It is good to remove it from repo.
@pramodcm95 : could you please provide a short update on your working status? The supervisors missing a lifesign.
Yes. Initially I had 2 options to explore: 1) cython - which requires all files to be converted to .pyx format and compile it from an external C compiler. This resulted in loads of errors and upon further seep investigation, Realized that cython GitHub already has lots of issues posted and working on it.
So simpler option would be to switch to numba, which takes higher time for the first compilation and then on depending on the type of code, it speeds up the process.
2) Numba: This basically uses JIT(just in time compiler) The following libraries from numba have been explored and tried to integrate in project:
@jitclass - compiles whole class using JIT compiler @jit _ simplest veersion, which gives us warnings if prerequisites of numba are not met @njit - called "no python mode" completes runs on numba style @njit(parallel = True) - parallelise the code, if the above @numba conditions are satisfied @vectorise - vectorises the code.
As all these have lots of pre-requisites, the idea to attack the function taking a lots of time was an ambitious task, since it has lots of interlinked functions which cannot be converted to numba straightaway. I am stuck at a certain point, where adding a numba decorator is asking an extra argument in certain functions and doesn't behave similar for other functions. Reading through their repository to find any such issues reported before.
I could execute mechanical_ode (in mechanical loads) in object mode of numba. Unfortunetly there isn't any great improvement. Instead, the own time of a function is increasing.
( to the left - without numba, to the right - with numba )
@pramodcm95 : could you please upload your altered code to a dedicated feature branch such that also others can have a view on it.
Regarding the result: are there any statements from the numba project or numba users that initial calls of numba code is leading to quite a significant overhead? Maybe the computional load of the code you have exchanged is very small while this presumed overhead is large?
sure. So i will create a new branch for this issue now?(since we had deleted the issue 35 branch before)
yes, Numba works that way. It has a very high initial overhead, once the all the values are cached it starts to reduce the execution time
A simple example looks like this:
Maybe this package could help to outsource performance-critical code into c code: