resurrecting-open-source-projects / dcfldd

Enhanced version of dd for forensics and security
GNU General Public License v2.0
90 stars 19 forks source link

diffwr option OOM kill #16

Closed szolnokit closed 1 year ago

szolnokit commented 1 year ago

Out of memory: Killed process 9737 (dcfldd-v1.9) total-vm:847432kB, anon-rss:844620kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:1696kB oom_score_adj:0

Prepare:

swapoff -a  # Just for a faster OOM
fallocate -l 10G test.dd  #create bigger file than your RAM

The bug: dcfldd diffwr=on if=/dev/zero of=test.dd # OOM kill

Default option (without diffwr) isn't affected.

The problem: When destination blocks are same (not written), memory blocks remain allocated.

Suggestion: Use my "full_write" function, instead code in released v1.9


At first, thank you for commit. And sorry, because I have no time to test @davidpolverari suggestion.

Dear @davidpolverari Thank you for suggestion, but your code is BAD :(. And unfortunately your suggested code became part of the version, instead of mine code. Your code eat all memory, and trigger OOM kill in a real life. My code maybe not beautiful as your, but I tested weeks, and used on a real life work before I published.

I only noticed that the v1.9 release contains memory leak bug, because I installed it from a distribution instead of mine version. When I tired my routine job with "diffwr" option, I get OOM kill after dcfldd eat all memory. And this is caused by your proposed code.

But I tested with valgrind against my version vs released v1.9. And the suspicion turned out to be true. With diffwr=on option almost all malloc-ed block remain allocated. But only in your suggested code, not in my code.

I see only this leak in your (=v1.9 released) version:

valgrind -s --track-origins=yes --leak-check=full --show-leak-kinds=all ./dcfldd-v1.9 bs=1k count=1k diffwr=on if=/dev/sda of=dst.dd
...
==9430== 1,048,576 bytes in 1,024 blocks are definitely lost in loss record 17 of 17
==9430==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==9430==    by 0x113F23: full_write (full-write.c:56)
==9430==    by 0x11807C: outputlist_write (output.c:161)
==9430==    by 0x115017: dd_copy (copy.c:322)
==9430==    by 0x10C656: main (dcfldd.c:753)
...

The leak size almost same size as the (not) written data size. With GB (not) written, easy to eat all memory.

Remark: My do..while(0) not nice I know, but this just a "simulated" try...throw..finally statement. The "break" is the "throw". :D Not nice, but my code always free the malloc-ed block. I don't know why, but your solution doesn't do that.

Maybe my code would have been better in the released version. Now the current release is buggy. :(

_Originally posted by @szolnokit in https://github.com/resurrecting-open-source-projects/dcfldd/pull/13#discussion_r1167625275_

davidpolverari commented 1 year ago

Fair enough. I'll take a look at your claims.

szolnokit commented 1 year ago

It is just a remark... @davidpolverari version would be good (OOM free), if the "free" command moved outside from 3rd if:

      if (diffwr) { /* Check destination block content is same as the buffer */
        char *rptr = NULL;
        off_t pos = lseek(desc, 0, SEEK_CUR);
        if ((pos >= 0) && (rptr = malloc(len))) {
          int rlen = safe_read(desc, rptr, len);
          if ((rlen <= 0) || (rlen != len) || (memcmp(rptr, ptr, len))) {
/*remove            free(rptr);  */
            lseek(desc, pos, SEEK_SET);
          } else {
            written = len;
          }
          free(rptr);  /*add here*/
        }
      }

It's not tested, just a suggestion.

davidpolverari commented 1 year ago

It is just a remark... @davidpolverari version would be good (OOM free), if the "free" command moved outside from 3rd if:

      if (diffwr) { /* Check destination block content is same as the buffer */
        char *rptr = NULL;
        off_t pos = lseek(desc, 0, SEEK_CUR);
        if ((pos >= 0) && (rptr = malloc(len))) {
          int rlen = safe_read(desc, rptr, len);
          if ((rlen <= 0) || (rlen != len) || (memcmp(rptr, ptr, len))) {
/*remove            free(rptr);  */
            lseek(desc, pos, SEEK_SET);
          } else {
            written = len;
          }
          free(rptr);  /*add here*/
        }
      }

It's not tested, just a suggestion.

Yes, I was analyzing the code before and I came to that conclusion, too. In the process of converting your code, I missed the free() on the "same block" path. I am preparing a fix for it, and I would like to ask you to test it when I push the test branch before I commit the final version. Is that ok?

szolnokit commented 1 year ago

Moving "free" outside... working like a charm...

Now, there are two options:

  1. Use my do...break...while(0) version
  2. Use modified code above
szolnokit commented 1 year ago

Yes, I was analyzing the code before and I came to that conclusion, too. In the process of converting your code, I missed the free() on the "same block" path. I am preparing a fix for it, and I would like to ask you to test it when I push the test branch before I commit the final version. Is that ok?

It was a sneaking bug. Because only occur when:

Ok, no problem... Currently, I use diffwr option on a real job with big flash drives (w/ multiple of= ), plus readback hash compare. The OOM will be revealed soon :D

davidpolverari commented 1 year ago

Yes, I'm working on a diff moving free() outside. The reason I didn't merge your code as it were is long-term maintenance: I wanted to simplify the branching logic to make it easier to understand for future maintainers. Of course, I did miss this detail and I messed up. I´m sorry for that! :flushed:

szolnokit commented 1 year ago

I can do that moving in minutes, if you like. Just only some minutes. I have free time currently :)

davidpolverari commented 1 year ago

I already did the patch and I'm almost ready to push it for testing. The only thing I'm wondering now is whether should we free() the pointer conditionally, eg:

if (rptr)
  free(rptr);

I know that it would be virtually impossible for it to happen, as if the malloc() failed, we wouldn't reach that point, and then this other if would be useless (and costly). I think we can free it unconditionally, right?

szolnokit commented 1 year ago

Put it boldly without if (rptr) . Because of && (rptr = malloc(len))), if (rptr) always would be true :D :D

This is good, I have tested:

      if (diffwr) { /* Check destination block content is same as the buffer */
        char *rptr = NULL;
        off_t pos = lseek(desc, 0, SEEK_CUR);
        if ((pos >= 0) && (rptr = malloc(len))) {
          int rlen = safe_read(desc, rptr, len);
          if ((rlen <= 0) || (rlen != len) || (memcmp(rptr, ptr, len))) {
            lseek(desc, pos, SEEK_SET);
          } else {
            written = len;
          }
          free(rptr);    /* rptr impossible to NULL here */
        }
      }
davidpolverari commented 1 year ago

Yes, I was just trying to check if there could be a corned case in the middle :).

I have pushed a branch with the fix. In my tests with valgrind, I didn't see any leaks[^1]. Please test it and tell me if it is ok now.

[^1]: There are other leaks, but not related to diffwr. I will have a look at them in another opportunity.

szolnokit commented 1 year ago

Ok, I'm beginning of some test with diffwr. With big test files, and real block devices. With commit 2cddf08

szolnokit commented 1 year ago

Yes, I have seen. There are some other mem leaks, but these are small leaks, and all occurs only once per execute. Not "inflating" memory leaks. From strdup, etc... And not related with "diffwr" option, these are "ancient" leaks :D :D

davidpolverari commented 1 year ago

Yeah, but I will try to hunt them anyway :).

szolnokit commented 1 year ago

OK. I tested commit 2cddf08 . No more OOM.

davidpolverari commented 1 year ago

Thanks a lot! And sorry for the mistake! I will merge it into master now.

szolnokit commented 1 year ago

Closed