score-p / scorep_plugin_x86_energy

This is the Score-P power and energy event plugin counter for newer Intel and AMD processors (via RAPL, resp. APM). The plugin supports reading msr registers directly or through the x86_adapt library.
BSD 3-Clause "New" or "Revised" License
7 stars 4 forks source link

Help for tracing error #11

Open Synlvejo opened 4 years ago

Synlvejo commented 4 years ago

Hello,

I can correctly collect the profile.cubex by using the scorep, and for example I use scorep-score to specify the right SCOREP_TOTAL_MEMORY should larger than 43MB. But when I install the scorep_plugin_x86_energy and set the env like this(without SCOREP_TOTAL_MEMORY): **##export SCOREP_ENABLE_TRACING="true"

export SCOREP_ENABLE_PROFILING="false"

export SCOREP_METRIC_PLUGINS=x86_energy_sync_plugin

export SCOREP_METRIC_PLUGINS_SEP=";"

export SCOREP_METRIC_X86_ENERGY_SYNC_PLUGIN="BLADE/E"

export SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US=0

export SCOREP_METRIC_X86_ENERGY_SYNC_PLUGIN_OFFSET=70**

I run the application again get the error like this:

NAS Parallel Benchmarks (NPB3.4-OMP) - IS Benchmark Size: 33554432 (class B) Iterations: 10 Number of available threads: 20 [Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks. [OTF2] src/OTF2_Buffer.c:359: error: This could not be done with the given memory: Could not allocate memory for chunk! [OTF2] src/otf2_archive_int.c:2122: error: This could not be done with the given memory: Can't create event writer! [OTF2] src/OTF2_Archive.c:977: error: This could not be done with the given memory: Could not get local event writer [Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks. [Score-P] src/measurement/SCOREP_Memory.c:175: Error: No free memory page available: [OTF2] src/OTF2_Buffer.c:359: error: This could not be done with the given memory: Could not allocate memory for chunk! [OTF2] src/otf2_archive_int.c:2122: error: This could not be done with the given memory: Can't create event writer! [OTF2] src/OTF2_Archive.c:977: error: This could not be done with the given memory: Could not get local event writer [Score-P] src/measurement/SCOREP_Memory.c:175: Error: No free memory page available: Out of memory. Please increase SCOREP_TOTAL_MEMORY=16384000 and try again. [Score-P] src/measurement/SCOREP_Memory.c:179: Error: No free memory page available: Please ensure that there are at least 2MB available for each intended location. [Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks. [OTF2] src/OTF2_Buffer.c:359: error: This could not be done with the given memory: Could not allocate memory for chunk! [Score-P] src/measurement/SCOREP_Memory.c:183: Error: No free memory page available: Where there are currently 20 locations in use in this failing process. [Score-P] Memory usage of rank 0 [Score-P] Memory used so far: Out of memory. Please increase SCOREP_TOTAL_MEMORY=16384000 and try again. …… …… [Score-P] src/measurement/SCOREP_Memory.c:179: Error: No free memory page available: Please ensure that there are at least 2MB available for each intended location. [Score-P] Score-P runtime-management memory tracking: Aborted

Then I set the env(SCOREP_TOTAL_MEMORY) large enough: ##export SCOREP_TOTAL_MEMORY=64000000(about 64MB) I will get a loop message:

[Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks. [Score-P] Trace buffer flush on rank 0. [Score-P] Increase SCOREP_TOTAL_MEMORY and try again.

Further more , I set the env: ##export SCOREP_TOTAL_MEMORY=6400000000(about 6.4GB) Error: is.B.x: ../../build-backend/../src/measurement/scorep_environment.c:299: SCOREP_Env_GetPageSize: Assertion `env_total_memory <= (4294967295U)' failed. Aborted

Is there any problem when I set the env or other operation?

Thanks for any help!

rschoene commented 4 years ago

Hi, It looks like you're running into two problems: buffers that are too large (6.4GB) and buffers that are too small (64MB). Why is 64 MB supposedly too small, you might ask. Probably, you did not activate this plugin for the profile that you fed to 'scorep-score'. With this plugin enabled, you will record metrics with every synchronous event (enter/exit) that occurs. Each of these will add to the byte count. Why does 6.4 GB not work, you might ask. This can be explained with the internals of Score-P, which uses 32bit values for storing offsets, if I remember correctly. Hence, only 4 GB allowed. Please try something like 'export SCOREP_TOTAL_MEMORY=1G' Btw, yes, you can add postfix scaling like M and G to make your 'SCOREP_TOTAL_MEMORY' setting more readable.

Synlvejo commented 4 years ago

Hi, It looks like you're running into two problems: buffers that are too large (6.4GB) and buffers that are too small (64MB). Why is 64 MB supposedly too small, you might ask. Probably, you did not activate this plugin for the profile that you fed to 'scorep-score'. With this plugin enabled, you will record metrics with every synchronous event (enter/exit) that occurs. Each of these will add to the byte count. Why does 6.4 GB not work, you might ask. This can be explained with the internals of Score-P, which uses 32bit values for storing offsets, if I remember correctly. Hence, only 4 GB allowed. Please try something like 'export SCOREP_TOTAL_MEMORY=1G' Btw, yes, you can add postfix scaling like M and G to make your 'SCOREP_TOTAL_MEMORY' setting more readable.

Thanks, Actualy, I have used 640MB to try and the error is like 64MB:

[Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks. [Score-P] Trace buffer flush on rank 0. [Score-P] Increase SCOREP_TOTAL_MEMORY and try again.

And I just tried the 1G,2G,3G,but the error is same. The src/measurement/tracing/SCOREP_Tracing.c:226:

/* ignore allocation failures, OTF2 will flush and free chunks */

if HAVE( UTILS_DEBUG )

if ( !chunk )
{
    UTILS_WARNING( "Cannot allocate %" PRIu64 " bytes for tracing; but OTF2 will flush and free chunks.", chunkSize );
}

endif

return chunk;

}

Maybe it's helpful?

And for btw, I tuned the SCOREP_TOTAL_MEMORY by add or remove a '0', lol

bmario commented 4 years ago

Looks like you're setting the sampling rate to infinity?

export SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US=0

Synlvejo commented 4 years ago

Looks like you're setting the sampling rate to infinity?

export SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US=0

Oh isn't it meaning default value? This is the installation guide value, at the bottom of page 17. Thanks!

bmario commented 4 years ago

The default value is 50000, which corresponds to 50ms.

Synlvejo commented 4 years ago

The default value is 50000, which corresponds to 50ms.

Thanks,

I tried to set value to 50000 without set SCOREP_TOTAL_MEMORY. The shell message consists of :

[Score-P] Memory: Location-Misc [Score-P] Memory allocated [bytes] 8192
[Score-P] Memory used [bytes] 984
[Score-P] Memory available [bytes] 7208
[Score-P] Number of pages allocated 1
[Score-P] Number of pages used 1

[Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks. [Score-P] Trace buffer flush on rank 0. [Score-P] Increase SCOREP_TOTAL_MEMORY and try again.

0 0x7ffb3d58bb9a in ???

1 0x7ffb3d58adc3 in ???

1 0x7ffb3d58adc3 in ???

1 0x7ffb3d58adc3 in ???

2 0x7ffb3ccbb3af in ???

And end with :

15 0x7ffb3d05ae64 in ???

16 0x7ffb3cd8388c in ???

17 0xffffffffffffffff in ???

Aborted

But when I set SCOREP_TOTAL_MEMORY=1G or larger , the message was looping again:

[Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks. [Score-P] Trace buffer flush on rank 0. [Score-P] Increase SCOREP_TOTAL_MEMORY and try again.

And when I set it back to 16MB , the message is like the fist one.

bmario commented 4 years ago

nvm. I just realized you're using the sync plugin. Then there's no need to set this environment variable anyways. However, you may still run into a similar issue. I recommend looking into filtering some regions.

Synlvejo commented 4 years ago

nvm. I just realized you're using the sync plugin. Then there's no need to set this environment variable anyways. However, you may still run into a similar issue. I recommend looking into filtering some regions.

It doesn't matter. I just want to get the energy message by not using the HDEEM, so I choose the scorep_plugin_x86_energy. Do you mean that I have no need to change SCOREP_TOTAL_MEMORY? Acording to the error message I can only do this . And is the suggestion is filtering some region and try again? I will try it.

bmario commented 4 years ago

I meant that the environment variable SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US isn't required in your case, because you use the sync plugin and for this plugin, the sampling rate is determined by rate of enter and leave events of your application run. Further, if you run into memory issues, the sampling rate is too high. Thus, you need to lower the sampling rate, which is done by lowering the number of enter and leave events. Hence, the hint to look into filtering regions. In some cases, you can get away with just increasing the available memory, but that doesn't seem to be the case for you.

Synlvejo commented 4 years ago

I meant that the environment variable SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US isn't required in your case, because you use the sync plugin and for this plugin, the sampling rate is determined by rate of enter and leave events of your application run. Further, if you run into memory issues, the sampling rate is too high. Thus, you need to lower the sampling rate, which is done by lowering the number of enter and leave events. Hence, the hint to look into filtering regions. In some cases, you can get away with just increasing the available memory, but that doesn't seem to be the case for you.

Okay, thanks . I will add a .filt file to filter some region and try again.

Synlvejo commented 4 years ago

I meant that the environment variable SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US isn't required in your case, because you use the sync plugin and for this plugin, the sampling rate is determined by rate of enter and leave events of your application run. Further, if you run into memory issues, the sampling rate is too high. Thus, you need to lower the sampling rate, which is done by lowering the number of enter and leave events. Hence, the hint to look into filtering regions. In some cases, you can get away with just increasing the available memory, but that doesn't seem to be the case for you.

It doesn't work ,but in the scorep-measurement-tmp folder there are 14 .evt files. Are they helpful for this issue? In addition, I have tried NPB/IS. Even class A there is this error. Maybe there are other possible resons?

Synlvejo commented 4 years ago

I meant that the environment variable SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US isn't required in your case, because you use the sync plugin and for this plugin, the sampling rate is determined by rate of enter and leave events of your application run. Further, if you run into memory issues, the sampling rate is too high. Thus, you need to lower the sampling rate, which is done by lowering the number of enter and leave events. Hence, the hint to look into filtering regions. In some cases, you can get away with just increasing the available memory, but that doesn't seem to be the case for you.

I set tracing false and profiling true and I can get the consumption in profile.cubex. Is this okay?

umbreensabirmain commented 4 years ago

I meant that the environment variable SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US isn't required in your case, because you use the sync plugin and for this plugin, the sampling rate is determined by rate of enter and leave events of your application run. Further, if you run into memory issues, the sampling rate is too high. Thus, you need to lower the sampling rate, which is done by lowering the number of enter and leave events. Hence, the hint to look into filtering regions. In some cases, you can get away with just increasing the available memory, but that doesn't seem to be the case for you.

I set tracing false and profiling true and I can get the consumption in profile.cubex. Is this okay?

Yes, with the sync plugin this should be okay.

Synlvejo commented 4 years ago

I meant that the environment variable SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US isn't required in your case, because you use the sync plugin and for this plugin, the sampling rate is determined by rate of enter and leave events of your application run. Further, if you run into memory issues, the sampling rate is too high. Thus, you need to lower the sampling rate, which is done by lowering the number of enter and leave events. Hence, the hint to look into filtering regions. In some cases, you can get away with just increasing the available memory, but that doesn't seem to be the case for you.

I set tracing false and profiling true and I can get the consumption in profile.cubex. Is this okay?

Yes, with the sync plugin this should be okay.

Thank you very much. I will use this config to do comperation experiment.

AndreasGocht commented 4 years ago

But when I set SCOREP_TOTAL_MEMORY=1G or larger , the message was looping again:

[Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks. [Score-P] Trace buffer flush on rank 0. [Score-P] Increase SCOREP_TOTAL_MEMORY and try again.

And when I set it back to 16MB , the message is like the fist one.

Another guess: do you use MPI? If so: Are there a lot of things happening before the actual MPI_INIT? This might lead Score-P into some Issues.

Best, Andreas

Synlvejo commented 4 years ago

But when I set SCOREP_TOTAL_MEMORY=1G or larger , the message was looping again: [Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks. [Score-P] Trace buffer flush on rank 0. [Score-P] Increase SCOREP_TOTAL_MEMORY and try again. And when I set it back to 16MB , the message is like the fist one.

Another guess: do you use MPI? If so: Are there a lot of things happening before the actual MPI_INIT? This might lead Score-P into some Issues. Best, Andreas

I only use OpenMP. And more a question, I found the energy message format in profile.cubex is ( a , b , c ) : d , e . I guess the means of them is (times , min value , max value) : average value , ? . What's the mean of "e"? Is it means standard deviation or variance? Or can I get the doc for them?

Best,too

umbreensabirmain commented 4 years ago

I only use OpenMP. And more a question, I found the energy message format in profile.cubex is ( a , b , c ) : d , e . I guess the means of them is (times , min value , max value) : average value , ? . What's the mean of "e"? Is it means standard deviation or variance? Or can I get the doc for them?

Best,too

It looks like you are getting the Cube tuple format. Did you use the Environment variable SCOREP_PROFILING_FORMAT? yes you are right, in this format the order of values is (times, min value, max value) : average value, standard deviation.

Synlvejo commented 4 years ago

I only use OpenMP. And more a question, I found the energy message format in profile.cubex is ( a , b , c ) : d , e . I guess the means of them is (times , min value , max value) : average value , ? . What's the mean of "e"? Is it means standard deviation or variance? Or can I get the doc for them? Best,too

It looks like you are getting the Cube tuple format. Did you use the Environment variable SCOREP_PROFILING_FORMAT? yes you are right, in this format the order of values is (times, min value, max value) : average value, standard deviation.

Yes I can not enable tracing because the error. So if I want to get the energy value I have to get it from the the profile.cubex. But in this way I have to deal the data by my python skill to analyse the region consumption. So painful. Thanks for reply.