vermaseren / form

The FORM project for symbolic manipulation of very big expressions
GNU General Public License v3.0
1.15k stars 136 forks source link

Memory leak in #211 #248

Closed tueda closed 6 years ago

tueda commented 6 years ago

This is a follow-up of #211. The example in that issue works:

#-
#: TermsInSmall 128
#: LargePatches 16
#: FilePatches 4
#: SubTermsInSmall 64
#: SubLargePatches 8
#: SubFilePatches 2

Off Statistics;

CFunction f,g;
Symbol x,y;

* 128*16=2048 terms cause a sort of the large buffer to disk.
* multiples of 2048*4=8192 terms cause a stage 4 sort
#define NTERMS "40001"

#define ARGNTERMS "2001"

Local test1 = <f(1)>+...+<f(`NTERMS')>;
Local test2 = g(<f(1)>+...+<f(`ARGNTERMS')>);
.sort

* Cancel all terms, but keep distance so that most terms only cancel in the final sort
Identify f(x?) = f(x) - f(`NTERMS'-x+1);
Argument g;
Identify f(x?) = f(x) - f(`ARGNTERMS'-x+1);
EndArgument;

Print +s;
.end

but now there are some memory leaks, though I'm not sure if it is very serious: valgrind --leak-check=full gives

==2303== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==2303== Command: ./vorm test.frm
==2303==
FORM 4.2.0 (Nov 30 2017, v4.2.0-24-g1514492) 64-bits  Run: Wed Dec  6 18:56:14 2017

...

   test1 = 0;

   test2 =
       + g(0)
      ;

  11.89 sec out of 11.89 sec
==2303==
==2303== HEAP SUMMARY:
==2303==     in use at exit: 346,930,544 bytes in 161 blocks
==2303==   total heap usage: 1,210 allocs, 1,049 frees, 392,480,661 bytes allocated
==2303==
==2303== 160,112 (112 direct, 160,000 indirect) bytes in 1 blocks are definitely lost in loss record 134 of 152
==2303==    at 0x4C2AB96: malloc (vg_replace_malloc.c:299)
==2303==    by 0x4FF908: Malloc1 (tools.c:2230)
==2303==    by 0x42A966: SetupOutputGZIP (compress.c:85)
==2303==    by 0x4E1F32: MergePatches (sort.c:3570)
==2303==    by 0x4E4016: EndSort (sort.c:1066)
==2303==    by 0x4BA233: Processor (proces.c:431)
==2303==    by 0x437809: DoExecute (execute.c:838)
==2303==    by 0x44DD20: ExecModule (module.c:274)
==2303==    by 0x4AFFC1: PreProcessor (pre.c:962)
==2303==    by 0x4E8E1F: main (startup.c:1605)
==2303==
==2303== 320,224 (224 direct, 320,000 indirect) bytes in 2 blocks are definitely lost in loss record 139 of 152
==2303==    at 0x4C2AB96: malloc (vg_replace_malloc.c:299)
==2303==    by 0x4FF908: Malloc1 (tools.c:2230)
==2303==    by 0x42A966: SetupOutputGZIP (compress.c:85)
==2303==    by 0x4E1F32: MergePatches (sort.c:3570)
==2303==    by 0x4E33A6: StoreTerm (sort.c:4301)
==2303==    by 0x4B73E8: Generator (proces.c:3157)
==2303==    by 0x4B89A5: Generator (proces.c:3931)
==2303==    by 0x4BA125: Processor (proces.c:404)
==2303==    by 0x437809: DoExecute (execute.c:838)
==2303==    by 0x44DD20: ExecModule (module.c:274)
==2303==    by 0x4AFFC1: PreProcessor (pre.c:962)
==2303==    by 0x4E8E1F: main (startup.c:1605)
==2303==
==2303== 428,208 (5,952 direct, 422,256 indirect) bytes in 1 blocks are definitely lost in loss record 140 of 152
==2303==    at 0x4C2AB96: malloc (vg_replace_malloc.c:299)
==2303==    by 0x4E3B46B: deflateInit2_ (in /localstore/theorie/tueda/build/linuxbrew/Cellar/zlib/1.2.11/lib/libz.so.1.2.11)
==2303==    by 0x4E3B613: deflateInit_ (in /localstore/theorie/tueda/build/linuxbrew/Cellar/zlib/1.2.11/lib/libz.so.1.2.11)
==2303==    by 0x42AA22: SetupOutputGZIP (compress.c:119)
==2303==    by 0x4E1F32: MergePatches (sort.c:3570)
==2303==    by 0x4E4016: EndSort (sort.c:1066)
==2303==    by 0x4BA233: Processor (proces.c:431)
==2303==    by 0x437809: DoExecute (execute.c:838)
==2303==    by 0x44DD20: ExecModule (module.c:274)
==2303==    by 0x4AFFC1: PreProcessor (pre.c:962)
==2303==    by 0x4E8E1F: main (startup.c:1605)
==2303==
==2303== LEAK SUMMARY:
==2303==    definitely lost: 6,288 bytes in 4 blocks
==2303==    indirectly lost: 902,256 bytes in 9 blocks
==2303==      possibly lost: 0 bytes in 0 blocks
==2303==    still reachable: 346,022,000 bytes in 148 blocks
==2303==         suppressed: 0 bytes in 0 blocks
==2303== Reachable blocks (those to which a pointer was found) are not shown.
==2303== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==2303==
==2303== For counts of detected and suppressed errors, rerun with: -v
==2303== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)
vermaseren commented 6 years ago

Hi Takahiro,

A cursory inspection may indicate that when the sort file is closed and removed, the gzip buffers are not freed. Then, when a new sortfile is opened, it may allocate more buffers. I will have to investigate this better.

Jos

On 6 Dec 2017, at 19:02, Takahiro Ueda notifications@github.com wrote:

This is a follow-up of #211 https://github.com/vermaseren/form/issues/211. The example in that issue works:

-

: TermsInSmall 128

: LargePatches 16

: FilePatches 4

: SubTermsInSmall 64

: SubLargePatches 8

: SubFilePatches 2

Off Statistics;

CFunction f,g; Symbol x,y;

  • 128*16=2048 terms cause a sort of the large buffer to disk.
  • multiples of 2048*4=8192 terms cause a stage 4 sort

    define NTERMS "40001"

define ARGNTERMS "2001"

Local test1 = <f(1)>+...+<f(NTERMS')>; Local test2 = g(<f(1)>+...+<f(ARGNTERMS')>); .sort

  • Cancel all terms, but keep distance so that most terms only cancel in the final sort Identify f(x?) = f(x) - f(NTERMS'-x+1); Argument g; Identify f(x?) = f(x) - f(ARGNTERMS'-x+1); EndArgument;

Print +s; .end but now there are some memory leaks, though I'm not sure if it is very serious: valgrind --leak-check=full gives

==2303== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==2303== Command: ./vorm test.frm ==2303== FORM 4.2.0 (Nov 30 2017, v4.2.0-24-g1514492) 64-bits Run: Wed Dec 6 18:56:14 2017

...

test1 = 0;

test2 =

  • g(0) ;

    11.89 sec out of 11.89 sec ==2303== ==2303== HEAP SUMMARY: ==2303== in use at exit: 346,930,544 bytes in 161 blocks ==2303== total heap usage: 1,210 allocs, 1,049 frees, 392,480,661 bytes allocated ==2303== ==2303== 160,112 (112 direct, 160,000 indirect) bytes in 1 blocks are definitely lost in loss record 134 of 152 ==2303== at 0x4C2AB96: malloc (vg_replace_malloc.c:299) ==2303== by 0x4FF908: Malloc1 (tools.c:2230) ==2303== by 0x42A966: SetupOutputGZIP (compress.c:85) ==2303== by 0x4E1F32: MergePatches (sort.c:3570) ==2303== by 0x4E4016: EndSort (sort.c:1066) ==2303== by 0x4BA233: Processor (proces.c:431) ==2303== by 0x437809: DoExecute (execute.c:838) ==2303== by 0x44DD20: ExecModule (module.c:274) ==2303== by 0x4AFFC1: PreProcessor (pre.c:962) ==2303== by 0x4E8E1F: main (startup.c:1605) ==2303== ==2303== 320,224 (224 direct, 320,000 indirect) bytes in 2 blocks are definitely lost in loss record 139 of 152 ==2303== at 0x4C2AB96: malloc (vg_replace_malloc.c:299) ==2303== by 0x4FF908: Malloc1 (tools.c:2230) ==2303== by 0x42A966: SetupOutputGZIP (compress.c:85) ==2303== by 0x4E1F32: MergePatches (sort.c:3570) ==2303== by 0x4E33A6: StoreTerm (sort.c:4301) ==2303== by 0x4B73E8: Generator (proces.c:3157) ==2303== by 0x4B89A5: Generator (proces.c:3931) ==2303== by 0x4BA125: Processor (proces.c:404) ==2303== by 0x437809: DoExecute (execute.c:838) ==2303== by 0x44DD20: ExecModule (module.c:274) ==2303== by 0x4AFFC1: PreProcessor (pre.c:962) ==2303== by 0x4E8E1F: main (startup.c:1605) ==2303== ==2303== 428,208 (5,952 direct, 422,256 indirect) bytes in 1 blocks are definitely lost in loss record 140 of 152 ==2303== at 0x4C2AB96: malloc (vg_replacemalloc.c:299) ==2303== by 0x4E3B46B: deflateInit2 (in /localstore/theorie/tueda/build/linuxbrew/Cellar/zlib/1.2.11/lib/libz.so.1.2.11) ==2303== by 0x4E3B613: deflateInit_ (in /localstore/theorie/tueda/build/linuxbrew/Cellar/zlib/1.2.11/lib/libz.so.1.2.11) ==2303== by 0x42AA22: SetupOutputGZIP (compress.c:119) ==2303== by 0x4E1F32: MergePatches (sort.c:3570) ==2303== by 0x4E4016: EndSort (sort.c:1066) ==2303== by 0x4BA233: Processor (proces.c:431) ==2303== by 0x437809: DoExecute (execute.c:838) ==2303== by 0x44DD20: ExecModule (module.c:274) ==2303== by 0x4AFFC1: PreProcessor (pre.c:962) ==2303== by 0x4E8E1F: main (startup.c:1605) ==2303== ==2303== LEAK SUMMARY: ==2303== definitely lost: 6,288 bytes in 4 blocks ==2303== indirectly lost: 902,256 bytes in 9 blocks ==2303== possibly lost: 0 bytes in 0 blocks ==2303== still reachable: 346,022,000 bytes in 148 blocks ==2303== suppressed: 0 bytes in 0 blocks ==2303== Reachable blocks (those to which a pointer was found) are not shown. ==2303== To see them, rerun with: --leak-check=full --show-leak-kinds=all ==2303== ==2303== For counts of detected and suppressed errors, rerun with: -v ==2303== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vermaseren/form/issues/248, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLxEt3nDVK72OoqQNMGGELr70c1VKkjks5s9tbCgaJpZM4Q4WKH.

vermaseren commented 6 years ago

Hi Takahiro,

I think I have indeed found the problem. I have to go now to the “jamboree”, and will repair it later. It is indeed that in CloseFile I should free all memory related to compressions. And in CreateFile I should make sure all the relevant buffers are set to zero. That should probably fix it.

Cheers

Jos

On 10 Dec 2017, at 23:32, Jos Vermaseren t68@nikhef.nl wrote:

Hi Takahiro,

A cursory inspection may indicate that when the sort file is closed and removed, the gzip buffers are not freed. Then, when a new sortfile is opened, it may allocate more buffers. I will have to investigate this better.

Jos

On 6 Dec 2017, at 19:02, Takahiro Ueda <notifications@github.com mailto:notifications@github.com> wrote:

This is a follow-up of #211 https://github.com/vermaseren/form/issues/211. The example in that issue works:

-

: TermsInSmall 128

: LargePatches 16

: FilePatches 4

: SubTermsInSmall 64

: SubLargePatches 8

: SubFilePatches 2

Off Statistics;

CFunction f,g; Symbol x,y;

  • 128*16=2048 terms cause a sort of the large buffer to disk.
  • multiples of 2048*4=8192 terms cause a stage 4 sort

    define NTERMS "40001"

define ARGNTERMS "2001"

Local test1 = <f(1)>+...+<f(NTERMS')>; Local test2 = g(<f(1)>+...+<f(ARGNTERMS')>); .sort

  • Cancel all terms, but keep distance so that most terms only cancel in the final sort Identify f(x?) = f(x) - f(NTERMS'-x+1); Argument g; Identify f(x?) = f(x) - f(ARGNTERMS'-x+1); EndArgument;

Print +s; .end but now there are some memory leaks, though I'm not sure if it is very serious: valgrind --leak-check=full gives

==2303== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info ==2303== Command: ./vorm test.frm ==2303== FORM 4.2.0 (Nov 30 2017, v4.2.0-24-g1514492) 64-bits Run: Wed Dec 6 18:56:14 2017

...

test1 = 0;

test2 =

  • g(0) ;

    11.89 sec out of 11.89 sec ==2303== ==2303== HEAP SUMMARY: ==2303== in use at exit: 346,930,544 bytes in 161 blocks ==2303== total heap usage: 1,210 allocs, 1,049 frees, 392,480,661 bytes allocated ==2303== ==2303== 160,112 (112 direct, 160,000 indirect) bytes in 1 blocks are definitely lost in loss record 134 of 152 ==2303== at 0x4C2AB96: malloc (vg_replace_malloc.c:299) ==2303== by 0x4FF908: Malloc1 (tools.c:2230) ==2303== by 0x42A966: SetupOutputGZIP (compress.c:85) ==2303== by 0x4E1F32: MergePatches (sort.c:3570) ==2303== by 0x4E4016: EndSort (sort.c:1066) ==2303== by 0x4BA233: Processor (proces.c:431) ==2303== by 0x437809: DoExecute (execute.c:838) ==2303== by 0x44DD20: ExecModule (module.c:274) ==2303== by 0x4AFFC1: PreProcessor (pre.c:962) ==2303== by 0x4E8E1F: main (startup.c:1605) ==2303== ==2303== 320,224 (224 direct, 320,000 indirect) bytes in 2 blocks are definitely lost in loss record 139 of 152 ==2303== at 0x4C2AB96: malloc (vg_replace_malloc.c:299) ==2303== by 0x4FF908: Malloc1 (tools.c:2230) ==2303== by 0x42A966: SetupOutputGZIP (compress.c:85) ==2303== by 0x4E1F32: MergePatches (sort.c:3570) ==2303== by 0x4E33A6: StoreTerm (sort.c:4301) ==2303== by 0x4B73E8: Generator (proces.c:3157) ==2303== by 0x4B89A5: Generator (proces.c:3931) ==2303== by 0x4BA125: Processor (proces.c:404) ==2303== by 0x437809: DoExecute (execute.c:838) ==2303== by 0x44DD20: ExecModule (module.c:274) ==2303== by 0x4AFFC1: PreProcessor (pre.c:962) ==2303== by 0x4E8E1F: main (startup.c:1605) ==2303== ==2303== 428,208 (5,952 direct, 422,256 indirect) bytes in 1 blocks are definitely lost in loss record 140 of 152 ==2303== at 0x4C2AB96: malloc (vg_replacemalloc.c:299) ==2303== by 0x4E3B46B: deflateInit2 (in /localstore/theorie/tueda/build/linuxbrew/Cellar/zlib/1.2.11/lib/libz.so.1.2.11) ==2303== by 0x4E3B613: deflateInit_ (in /localstore/theorie/tueda/build/linuxbrew/Cellar/zlib/1.2.11/lib/libz.so.1.2.11) ==2303== by 0x42AA22: SetupOutputGZIP (compress.c:119) ==2303== by 0x4E1F32: MergePatches (sort.c:3570) ==2303== by 0x4E4016: EndSort (sort.c:1066) ==2303== by 0x4BA233: Processor (proces.c:431) ==2303== by 0x437809: DoExecute (execute.c:838) ==2303== by 0x44DD20: ExecModule (module.c:274) ==2303== by 0x4AFFC1: PreProcessor (pre.c:962) ==2303== by 0x4E8E1F: main (startup.c:1605) ==2303== ==2303== LEAK SUMMARY: ==2303== definitely lost: 6,288 bytes in 4 blocks ==2303== indirectly lost: 902,256 bytes in 9 blocks ==2303== possibly lost: 0 bytes in 0 blocks ==2303== still reachable: 346,022,000 bytes in 148 blocks ==2303== suppressed: 0 bytes in 0 blocks ==2303== Reachable blocks (those to which a pointer was found) are not shown. ==2303== To see them, rerun with: --leak-check=full --show-leak-kinds=all ==2303== ==2303== For counts of detected and suppressed errors, rerun with: -v ==2303== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vermaseren/form/issues/248, or mute the thread https://github.com/notifications/unsubscribe-auth/AFLxEt3nDVK72OoqQNMGGELr70c1VKkjks5s9tbCgaJpZM4Q4WKH.

tueda commented 6 years ago

Nice. Do you think it is the cause of memory leaks in long runs? Happens many times in a run?

vermaseren commented 6 years ago

Hi Takahiro

It probably happens in each module for each Expression that needs a sortfile. And in each worker.

Cheers

Jos

Verstuurd vanaf mijn iPhone

Op 12 dec. 2017 om 12:51 heeft Takahiro Ueda notifications@github.com het volgende geschreven:

Nice. Do you think it is the cause of memory leaks in long runs? Happens many times in a run?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

tueda commented 6 years ago

Fixed in a484c0a and 4493f01.

tueda commented 6 years ago

Indeed, this is identified as the source of big memory leaks in long runs. After fixing the leaks, we have confirmed memory usage of a program (TFORM with 24 cores, 93 hours) is considerably reduced (from VSZ 279G RSS 150G to VSZ 118G RSS 77G).