nuta / resea

A microkernel-based hackable operating system.
Other
522 stars 29 forks source link

Heap fragmentation problem in malloc #23

Closed nuta closed 4 years ago

nuta commented 4 years ago

The current malloc() implementation uses a naive algorithm and it easily creates heap fragmentation. Obviously we need to fix this.

https://github.com/nuta/resea/blob/93d3f4dafb423a87b8b740f1e246039e630d1b70/libs/resea/malloc.c#L66-L88

nuta commented 4 years ago

Using multiple bins might mitigate fragmentations.

In the current malloc implementation, there's only one bin (or aka free list) chunks:

https://github.com/nuta/resea/blob/93d3f4dafb423a87b8b740f1e246039e630d1b70/libs/resea/malloc.c#L8

chunks is a linked list of unused (free) memory regions. If malloc is called, it searches the bin (chunks) for a sufficiently large chunk (first-fit), split the chunk into the allocated chunk and the remaining chunk, and lastly returns the pointer to the allocated chunk:

(1) Look for the search

          +-----------------+
prev -->  |   free chunk A  |  --> next free chunk
          +-----------------+

(2) Split the found chunk A into new two chunks A' and B'

          +-------+  +------+
prev -->  |   A'  |  |  B'  |  --> next free chunk
          +-------+  +------+

(3) Remove B' from the free list (`chunks`)

          +------+
prev -->  |  A'  |  --> next free chunk
          +------+

Multiple bins, as its name suggests, malloc has multiple chunks (i.e. static struct malloc_chunk *chunks[NUM_CHUNKS]). Each bin has predefined-sized chunks (8 byte, 16 bytes, 64 bytes, 256 bytes, 4096 bytes, and bigger than 4096 bytes, for example).

In multiple bins approach, malloc does not split the chunk except ones in the bin for remaining big ones. That is, once a chunk splitted into 16 bytes long, it will be in the bin for 16-bytes chunks forever.

I believe this approach work well.

yashrajkakkad commented 4 years ago

Thank you for the wonderful explanation. I have a quick question. Pardon me if this is naive. Do we decide the number of chunks of sizes 8 bytes, 16 bytes etc. beforehand? Also, are all malloc requests multiple of 8 bytes in resea as described in the page you linked? If not, we still have some internal fragmentation there right?

nuta commented 4 years ago

Do we decide the number of chunks of sizes 8 bytes, 16 bytes etc. beforehand?

Yes. We don't need to change the size of chunk in each bin dynamically. I recommend using power of 2 for chunk sizes and since existing Resea applications don't need large chunks, I suppose 16 is a good number of bins (i.e. the first bin have 2^0 == 1-sized chunks, the second largest bin have 2^14 == 16384-sized chunks, and the last bin have any bigger chunks). Of course I welcome other ideas.

If not, we still have some internal fragmentation there right?

Good point. Internal fragmentation is still inevitable in this multiple bins approach. To eliminate internal fragmentation, I think we need another solution such as the slab allocator.

yashrajkakkad commented 4 years ago

chunks is a linked list of unused (free) memory regions.

Are all the chunks free? If so, why do we have the below condition in the loop -

            if (chunk->magic != MALLOC_FREE) {
                continue;
            }

Also, I am assuming that initially chunks will have one chunk whose size is equal to the size of the heap. So from what I understand, for our multiple bins approach, we have to choose among the following:

nuta commented 4 years ago

Are all the chunks free? If so, why do we have the below condition in the loop -

Seems you're right. All chunks in the chunks are free. I forgot to remove in-use chunks from chunks.

Also, I am assuming that initially chunks will have one chunk whose size is equal to the size of the heap. So from what I understand, for our multiple bins approach, we have to choose among the following:

That's correct. Initially, we have a single chunk which points to the whole heap.

On request, check if there's a free chunk of the given size. If not, split the large chunk and put it in the list. This idea needs a bit of refinement.

I think this way is better. As you say we might need further improvements around the splitting, but I suppose it's way better than the current first-fit approach.

yashrajkakkad commented 4 years ago

Okay. So here's what I think we can do:

Please let me know your views and suggestions.

nuta commented 4 years ago

Sounds good! Let's try it.

nuta commented 4 years ago

I have some comments regarding malloc tests.

yashrajkakkad commented 4 years ago

Is there a way I can use ceil and log2 from standard math.h library? I intend to do something like below:

size_t get_chunk_number_from_size(size_t size) {
    // If requested size is less or equal to the size of second largest chunk
    // (the last fixed chunk)
    if (size <= 1 << (NUM_CHUNKS - 2)) {
        return ceil(log2(size));
    }
    // Return index of the last, dynamic-sized chunk
    return NUM_CHUNKS - 1;
}
yashrajkakkad commented 4 years ago

About the tests, noted! Let me try to figure out how to write proper tests. (I have written unittests for Django REST APIs in the past)

nuta commented 4 years ago

log2 has been deleted because it was no longer used. It might be good idea to re-add it to libs/resea.

Since size is an integer, I believe there're clever algorithm using bitwise operators. That said, as the first step, how about a naive implementation using a for loop like:

size_t get_chunk_number_from_size(size_t size) {
     for (int i = 0; i < NUM_CHUNKS; i++) {
         if (size <= 1 << i) {
            return i;
        }
     }

     return NUM_CHUNKS - 1;
 }
yashrajkakkad commented 4 years ago

Okay yes. This sounds fair for a start. Once we get everything to work, we'll probably consider optimizing this. Thank you!

yashrajkakkad commented 4 years ago

I added the function definition in test.h and added malloc_test.o in the associated build.mk file. Is that enough? I think my test is not running as even ASSERT(1==2) is not creating a warning.

I have implemented a malloc but as it crashed the OS I renamed it to mymalloc to test it properly first.

nuta commented 4 years ago

Can you share your code on GitHub so I can take a look into the problem?

On Tue, Oct 20, 2020 at 22:27 Yashraj Kakkad notifications@github.com wrote:

I added the function definition in test.h and added malloc_test.o in the associated build.mk file. Is that enough? I think my test is not running as even ASSERT(1==2) is not creating a warning.

I have implemented a malloc but as it crashed the OS I renamed it to mymalloc to test it properly first.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nuta/resea/issues/23#issuecomment-712849336, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGR2EVG7AKTKBSNMHXRJITSLWF4NANCNFSM4RXHRHCQ .

-- Seiya

yashrajkakkad commented 4 years ago

Yes, here's the branch we are working on - https://github.com/yashrajkakkad/resea/tree/malloc/libs/resea

malloc_test - https://github.com/yashrajkakkad/resea/blob/malloc/servers/apps/test/malloc_test.c mymalloc - https://github.com/yashrajkakkad/resea/blob/malloc/libs/resea/mymalloc.c

yashrajkakkad commented 4 years ago

Got the problem @nuta! I had forgotten to modify main.c. Meanwhile, how much memory does a typical resea application ask for as of now?

yashrajkakkad commented 4 years ago

I have got the preliminary tests to work. You can find it in the same branch and the same files as I mentioned before. I found one or two small bugs when I gave my implementation a second thought. I am working on that.

Meanwhile, the MALLOC_FRAME_LEN in my system is 64. I suppose that's a constant value. Is there any point therefore to have chunks smaller than that? The system crashes right now because ASSERT(len > MALLOC_FRAME_LEN) fails.

nuta commented 4 years ago

Meanwhile, how much memory does a typical resea application ask for as of now?

Do you mean the typical chunk size (i.e. x in malloc(x)) or the total memory consumption? I've never measured both metrics.

Meanwhile, the MALLOC_FRAME_LEN in my system is 64. I suppose that's a constant value. Is there any point therefore to have chunks smaller than that? The system crashes right now because ASSERT(len > MALLOC_FRAME_LEN) fails.

Currently the chunk header must be larger than 64 bytes. The chunk capacity can be less than 64 bytes.

On Wed, Oct 21, 2020 at 8:05 PM Yashraj Kakkad notifications@github.com wrote:

I have got the preliminary tests to work. You can find it in the same branch and the same files as I mentioned before. I found one or two small bugs when I gave my implementation a second thought. I am working on that.

Meanwhile, the MALLOC_FRAME_LEN in my system is 64. I suppose that's a constant value. Is there any point therefore to have chunks smaller than that? The system crashes right now because ASSERT(len > MALLOC_FRAME_LEN) fails.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nuta/resea/issues/23#issuecomment-713490815, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGR2EW4NH4FET6BV5WXOE3SL26A3ANCNFSM4RXHRHCQ .

yashrajkakkad commented 4 years ago

Apologies for the delayed response. I actually had misunderstood some part of the code. Nevertheless, I have fixed the impending issues, integrated my new malloc to our branch and the tests have passed too. I have used DBG statements to verify that the behaviour is as expected. Resea seems to work perfectly fine. I request you to glance at it once before I make a pull request: resea I, too, will meanwhile see if I find something wrong.

yashrajkakkad commented 4 years ago

Two integrated tests are apparently failing in GitHub actions - https://github.com/yashrajkakkad/resea/runs/1315001384?check_suite_focus=true One is related to DHCP and other is related to shell. I am not sure what I ended up breaking. I am trying to investigate into that as well.

nuta commented 4 years ago

Looks like your failing tests are running too long than expected so just ignore them.

I've added some improvements on integrated tests. Please pull and try the latest master branch.

On Tue, Oct 27, 2020 at 10:57 PM Yashraj Kakkad notifications@github.com wrote:

Two integrated tests are apparently failing in GitHub actions - https://github.com/yashrajkakkad/resea/runs/1315001384?check_suite_focus=true One is related to DHCP and other is related to shell. I am not sure what I ended up breaking. I am trying to investigate into that as well.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nuta/resea/issues/23#issuecomment-717261563, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGR2EUQV3OOWESHEHTCMY3SM3GUFANCNFSM4RXHRHCQ .

nuta commented 4 years ago

Specifically, the timeout for each test is specified through the --timeout option:

https://github.com/nuta/resea/blob/master/.github/workflows/integrated_tests.yml#L82

On Wed, Oct 28, 2020 at 9:19 PM Seiya Nuta nuta@seiya.me wrote:

Looks like your failing tests are running too long than expected so just ignore them.

I've added some improvements on integrated tests. Please pull and try the latest master branch.

On Tue, Oct 27, 2020 at 10:57 PM Yashraj Kakkad notifications@github.com wrote:

Two integrated tests are apparently failing in GitHub actions - https://github.com/yashrajkakkad/resea/runs/1315001384?check_suite_focus=true One is related to DHCP and other is related to shell. I am not sure what I ended up breaking. I am trying to investigate into that as well.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nuta/resea/issues/23#issuecomment-717261563, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGR2EUQV3OOWESHEHTCMY3SM3GUFANCNFSM4RXHRHCQ .

yashrajkakkad commented 4 years ago

They still fail with a timeout - https://github.com/yashrajkakkad/resea/runs/1320690388?check_suite_focus=true

Also, minlin application shows an error - 'Already exists' when I run it in my system. Rest looks fine. I did some debugging but nothing seems suspicious from my changes.

yashrajkakkad commented 4 years ago

Also, minlin application shows an error - 'Already exists' when I run it in my system. Rest looks fine. I did some debugging but nothing seems suspicious from my changes.

Please ignore this. This shows up in resea's master branch as well.

nuta commented 4 years ago

If you think your implementation is ready for the review, please feel free to send a pull request even if some CI tests are failing. They could be newly discovered bugs thanks to your new malloc implementation.

yashrajkakkad commented 4 years ago

Just did that! I request you to check!

nuta commented 4 years ago

Closed by #27