Open Synlvejo opened 4 years ago
Hi, It looks like you're running into two problems: buffers that are too large (6.4GB) and buffers that are too small (64MB). Why is 64 MB supposedly too small, you might ask. Probably, you did not activate this plugin for the profile that you fed to 'scorep-score'. With this plugin enabled, you will record metrics with every synchronous event (enter/exit) that occurs. Each of these will add to the byte count. Why does 6.4 GB not work, you might ask. This can be explained with the internals of Score-P, which uses 32bit values for storing offsets, if I remember correctly. Hence, only 4 GB allowed. Please try something like 'export SCOREP_TOTAL_MEMORY=1G' Btw, yes, you can add postfix scaling like M and G to make your 'SCOREP_TOTAL_MEMORY' setting more readable.
Hi, It looks like you're running into two problems: buffers that are too large (6.4GB) and buffers that are too small (64MB). Why is 64 MB supposedly too small, you might ask. Probably, you did not activate this plugin for the profile that you fed to 'scorep-score'. With this plugin enabled, you will record metrics with every synchronous event (enter/exit) that occurs. Each of these will add to the byte count. Why does 6.4 GB not work, you might ask. This can be explained with the internals of Score-P, which uses 32bit values for storing offsets, if I remember correctly. Hence, only 4 GB allowed. Please try something like 'export SCOREP_TOTAL_MEMORY=1G' Btw, yes, you can add postfix scaling like M and G to make your 'SCOREP_TOTAL_MEMORY' setting more readable.
Thanks, Actualy, I have used 640MB to try and the error is like 64MB:
[Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks. [Score-P] Trace buffer flush on rank 0. [Score-P] Increase SCOREP_TOTAL_MEMORY and try again.
And I just tried the 1G,2G,3G,but the error is same. The src/measurement/tracing/SCOREP_Tracing.c:226:
/* ignore allocation failures, OTF2 will flush and free chunks */
if ( !chunk )
{
UTILS_WARNING( "Cannot allocate %" PRIu64 " bytes for tracing; but OTF2 will flush and free chunks.", chunkSize );
}
return chunk;
}
Maybe it's helpful?
And for btw, I tuned the SCOREP_TOTAL_MEMORY by add or remove a '0', lol
Looks like you're setting the sampling rate to infinity?
export SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US=0
Looks like you're setting the sampling rate to infinity?
export SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US=0
Oh isn't it meaning default value? This is the installation guide value, at the bottom of page 17. Thanks!
The default value is 50000, which corresponds to 50ms.
The default value is 50000, which corresponds to 50ms.
Thanks,
I tried to set value to 50000 without set SCOREP_TOTAL_MEMORY. The shell message consists of :
[Score-P] Memory: Location-Misc
[Score-P] Memory allocated [bytes] 8192
[Score-P] Memory used [bytes] 984
[Score-P] Memory available [bytes] 7208
[Score-P] Number of pages allocated 1
[Score-P] Number of pages used 1
[Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks. [Score-P] Trace buffer flush on rank 0. [Score-P] Increase SCOREP_TOTAL_MEMORY and try again.
And end with :
Aborted
But when I set SCOREP_TOTAL_MEMORY=1G or larger , the message was looping again:
[Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks. [Score-P] Trace buffer flush on rank 0. [Score-P] Increase SCOREP_TOTAL_MEMORY and try again.
And when I set it back to 16MB , the message is like the fist one.
nvm. I just realized you're using the sync plugin. Then there's no need to set this environment variable anyways. However, you may still run into a similar issue. I recommend looking into filtering some regions.
nvm. I just realized you're using the sync plugin. Then there's no need to set this environment variable anyways. However, you may still run into a similar issue. I recommend looking into filtering some regions.
It doesn't matter. I just want to get the energy message by not using the HDEEM, so I choose the scorep_plugin_x86_energy. Do you mean that I have no need to change SCOREP_TOTAL_MEMORY? Acording to the error message I can only do this . And is the suggestion is filtering some region and try again? I will try it.
I meant that the environment variable SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US
isn't required in your case, because you use the sync plugin and for this plugin, the sampling rate is determined by rate of enter and leave events of your application run. Further, if you run into memory issues, the sampling rate is too high. Thus, you need to lower the sampling rate, which is done by lowering the number of enter and leave events. Hence, the hint to look into filtering regions. In some cases, you can get away with just increasing the available memory, but that doesn't seem to be the case for you.
I meant that the environment variable SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US isn't required in your case, because you use the sync plugin and for this plugin, the sampling rate is determined by rate of enter and leave events of your application run. Further, if you run into memory issues, the sampling rate is too high. Thus, you need to lower the sampling rate, which is done by lowering the number of enter and leave events. Hence, the hint to look into filtering regions. In some cases, you can get away with just increasing the available memory, but that doesn't seem to be the case for you.
Okay, thanks . I will add a .filt file to filter some region and try again.
I meant that the environment variable SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US isn't required in your case, because you use the sync plugin and for this plugin, the sampling rate is determined by rate of enter and leave events of your application run. Further, if you run into memory issues, the sampling rate is too high. Thus, you need to lower the sampling rate, which is done by lowering the number of enter and leave events. Hence, the hint to look into filtering regions. In some cases, you can get away with just increasing the available memory, but that doesn't seem to be the case for you.
It doesn't work ,but in the scorep-measurement-tmp folder there are 14 .evt files. Are they helpful for this issue? In addition, I have tried NPB/IS. Even class A there is this error. Maybe there are other possible resons?
I meant that the environment variable SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US isn't required in your case, because you use the sync plugin and for this plugin, the sampling rate is determined by rate of enter and leave events of your application run. Further, if you run into memory issues, the sampling rate is too high. Thus, you need to lower the sampling rate, which is done by lowering the number of enter and leave events. Hence, the hint to look into filtering regions. In some cases, you can get away with just increasing the available memory, but that doesn't seem to be the case for you.
I set tracing false and profiling true and I can get the consumption in profile.cubex. Is this okay?
I meant that the environment variable SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US isn't required in your case, because you use the sync plugin and for this plugin, the sampling rate is determined by rate of enter and leave events of your application run. Further, if you run into memory issues, the sampling rate is too high. Thus, you need to lower the sampling rate, which is done by lowering the number of enter and leave events. Hence, the hint to look into filtering regions. In some cases, you can get away with just increasing the available memory, but that doesn't seem to be the case for you.
I set tracing false and profiling true and I can get the consumption in profile.cubex. Is this okay?
Yes, with the sync plugin this should be okay.
I meant that the environment variable SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US isn't required in your case, because you use the sync plugin and for this plugin, the sampling rate is determined by rate of enter and leave events of your application run. Further, if you run into memory issues, the sampling rate is too high. Thus, you need to lower the sampling rate, which is done by lowering the number of enter and leave events. Hence, the hint to look into filtering regions. In some cases, you can get away with just increasing the available memory, but that doesn't seem to be the case for you.
I set tracing false and profiling true and I can get the consumption in profile.cubex. Is this okay?
Yes, with the sync plugin this should be okay.
Thank you very much. I will use this config to do comperation experiment.
But when I set SCOREP_TOTAL_MEMORY=1G or larger , the message was looping again:
[Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks. [Score-P] Trace buffer flush on rank 0. [Score-P] Increase SCOREP_TOTAL_MEMORY and try again.
And when I set it back to 16MB , the message is like the fist one.
Another guess: do you use MPI? If so: Are there a lot of things happening before the actual MPI_INIT? This might lead Score-P into some Issues.
Best, Andreas
But when I set SCOREP_TOTAL_MEMORY=1G or larger , the message was looping again: [Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks. [Score-P] Trace buffer flush on rank 0. [Score-P] Increase SCOREP_TOTAL_MEMORY and try again. And when I set it back to 16MB , the message is like the fist one.
Another guess: do you use MPI? If so: Are there a lot of things happening before the actual MPI_INIT? This might lead Score-P into some Issues. Best, Andreas
I only use OpenMP. And more a question, I found the energy message format in profile.cubex is ( a , b , c ) : d , e . I guess the means of them is (times , min value , max value) : average value , ? . What's the mean of "e"? Is it means standard deviation or variance? Or can I get the doc for them?
Best,too
I only use OpenMP. And more a question, I found the energy message format in profile.cubex is ( a , b , c ) : d , e . I guess the means of them is (times , min value , max value) : average value , ? . What's the mean of "e"? Is it means standard deviation or variance? Or can I get the doc for them?
Best,too
It looks like you are getting the Cube tuple format. Did you use the Environment variable SCOREP_PROFILING_FORMAT? yes you are right, in this format the order of values is (times, min value, max value) : average value, standard deviation.
I only use OpenMP. And more a question, I found the energy message format in profile.cubex is ( a , b , c ) : d , e . I guess the means of them is (times , min value , max value) : average value , ? . What's the mean of "e"? Is it means standard deviation or variance? Or can I get the doc for them? Best,too
It looks like you are getting the Cube tuple format. Did you use the Environment variable SCOREP_PROFILING_FORMAT? yes you are right, in this format the order of values is (times, min value, max value) : average value, standard deviation.
Yes I can not enable tracing because the error. So if I want to get the energy value I have to get it from the the profile.cubex. But in this way I have to deal the data by my python skill to analyse the region consumption. So painful. Thanks for reply.
Hello,
I can correctly collect the profile.cubex by using the scorep, and for example I use scorep-score to specify the right SCOREP_TOTAL_MEMORY should larger than 43MB. But when I install the scorep_plugin_x86_energy and set the env like this(without SCOREP_TOTAL_MEMORY): **##export SCOREP_ENABLE_TRACING="true"
export SCOREP_ENABLE_PROFILING="false"
export SCOREP_METRIC_PLUGINS=x86_energy_sync_plugin
export SCOREP_METRIC_PLUGINS_SEP=";"
export SCOREP_METRIC_X86_ENERGY_SYNC_PLUGIN="BLADE/E"
export SCOREP_METRIC_X86_ENERGY_PLUGIN_INTERVALL_US=0
export SCOREP_METRIC_X86_ENERGY_SYNC_PLUGIN_OFFSET=70**
I run the application again get the error like this:
NAS Parallel Benchmarks (NPB3.4-OMP) - IS Benchmark Size: 33554432 (class B) Iterations: 10 Number of available threads: 20 [Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks. [OTF2] src/OTF2_Buffer.c:359: error: This could not be done with the given memory: Could not allocate memory for chunk! [OTF2] src/otf2_archive_int.c:2122: error: This could not be done with the given memory: Can't create event writer! [OTF2] src/OTF2_Archive.c:977: error: This could not be done with the given memory: Could not get local event writer [Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks. [Score-P] src/measurement/SCOREP_Memory.c:175: Error: No free memory page available: [OTF2] src/OTF2_Buffer.c:359: error: This could not be done with the given memory: Could not allocate memory for chunk! [OTF2] src/otf2_archive_int.c:2122: error: This could not be done with the given memory: Can't create event writer! [OTF2] src/OTF2_Archive.c:977: error: This could not be done with the given memory: Could not get local event writer [Score-P] src/measurement/SCOREP_Memory.c:175: Error: No free memory page available: Out of memory. Please increase SCOREP_TOTAL_MEMORY=16384000 and try again. [Score-P] src/measurement/SCOREP_Memory.c:179: Error: No free memory page available: Please ensure that there are at least 2MB available for each intended location. [Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks. [OTF2] src/OTF2_Buffer.c:359: error: This could not be done with the given memory: Could not allocate memory for chunk! [Score-P] src/measurement/SCOREP_Memory.c:183: Error: No free memory page available: Where there are currently 20 locations in use in this failing process. [Score-P] Memory usage of rank 0 [Score-P] Memory used so far: Out of memory. Please increase SCOREP_TOTAL_MEMORY=16384000 and try again. …… …… [Score-P] src/measurement/SCOREP_Memory.c:179: Error: No free memory page available: Please ensure that there are at least 2MB available for each intended location. [Score-P] Score-P runtime-management memory tracking: Aborted
Then I set the env(SCOREP_TOTAL_MEMORY) large enough: ##export SCOREP_TOTAL_MEMORY=64000000(about 64MB) I will get a loop message:
[Score-P] src/measurement/tracing/SCOREP_Tracing.c:226: Warning: Cannot allocate 1048576 bytes for tracing; but OTF2 will flush and free chunks. [Score-P] Trace buffer flush on rank 0. [Score-P] Increase SCOREP_TOTAL_MEMORY and try again.
Further more , I set the env: ##export SCOREP_TOTAL_MEMORY=6400000000(about 6.4GB) Error: is.B.x: ../../build-backend/../src/measurement/scorep_environment.c:299: SCOREP_Env_GetPageSize: Assertion `env_total_memory <= (4294967295U)' failed. Aborted
Is there any problem when I set the env or other operation?
Thanks for any help!