Open Bitseat opened 4 years ago
Thanks for including the dot.ps files, @Bitseat, however could you enable "relative" when generating them, as the absolute CPU allocation numbers are difficult to interpret?
BTW, I think you can include the dot
files instead, no need to convert to ps
, hopefully I can visualize them.
@ngeiswei, We can export the graph as DOT file or as an image. Exporting it as DOT file can only show us the absolute costs even if "relative" is enabled (look ported and Unported). So I think it is better exporting it as an image. look here and here. A few more images are also added into the other profiling test folders.
Introduction As it is explained in the git issue #69, asmoses is about 1.35x slower when atomspace port is enabled without subprogram memoization, and 1.15x slower with subprogram memoization. The goal of this profiling is to find where the ported program spends too much of its time and improve those places to have asmoses run at least as fast as moses when atomspace-port is enabled, and faster when subprogram memoization is enabled. First, a summary of profiling results with and without porting atomspace is explained then places that need improvement are suggested.
Result As an example, by running these two commands,
valgrind --tool=callgrind asmoses -Hmux -k3 -m2000 -p0.0001 --atomspace-port=1
andvalgrind --tool=callgrind asmoses -Hmux -k3 -m2000 -p0.0001
, we can view and analyze the results (callgrind.out.4138 and callgrind.out.5406) on kcachegrind. As an alternative, you can view snips of the graphs here and here.valgrind --tool=callgrind asmoses -Hmux -k3 -m2000 -p0.0001 --atomspace-port=1
valgrind --tool=callgrind asmoses -Hmux -k3 -m2000 -p0.0001
As the result where atomspace-port is enabled depicts, too much of the CPU processing time is taken by complete_truth_table::populate. It clearly shows the functions
Interpreter
(16.22%),populate_features
(51.07%), andsetup_features
(13.38%) taking the vast majority of the time of our program, and we can see thatpopulate_features
has most of its time spent by thecreateLink
function.Plan Removing the repetitive call to the
createLink
is expected to improve the performance greatly.Note: related profiling results can be found here and one table based problem result here