On-demand heap memory management maintaining compressed pointer

esevan commented 7 years ago

This is a proposal. In IoT.js issue #211, I've discussed about the change of JerryScript memory allocator. The statically reserved heap makes the JerryScript waste the memory by allocating 512KB heap statically even though it only uses few KBs of it. It makes several problems in memory constrained device when it is used with the other platform or other processes.

With IoT.js, this waste may hinder IoT.js platform from allocating system memory. With other processes, especially with other JS applications, the waste will be bigger since every JS engine instance of each application allocates over-provisioned heap. The use-case running multiple processes in a lightweight sensor device is already showed in TinyOS, and Contiki.

If we select on-demand object allocation, however, we cannot exploit the compressed pointer due to random property of object address in 32-bit memory address space.

I suggest that we allocate memory on-demand with granularity of Segment which is much bigger than object size but less than heap size, e.g. 8KB. We can still compress the address by keeping multiple segment base addresses instead of a single heap base address. The heap size will be dynamic from 8KB ~ 512KB.

The experiment detail of this proposal is attached in IoT.js issue thread.

Any opinion will be very welcome. Thanks.

zherczeg commented 7 years ago

I have been thinking about segments before. Unfortunately I see several problems with them:

what happens if an application needs a bigger than 8K block (e.g. a large string)?
resolving compressed pointers is non-trivial
if a static object is allocated on a segment: that segment will never been freed unless we implement a moving garbage collector

Personally I have been thinking of custom allocators for some time. Jerry could simply use the system allocator provided that allocator returns with 8 byte aligned pointers (it can be ensured that the requested space is rounded up to 8 bytes). JERRY_CPOINTER_32_BIT was also a preparation for custom allocators. Of course systems with less than 512K memory could still use compressed pointers.

Would using system allocators solve your problem?

esevan commented 7 years ago

@zherczeg Thank you for your opinion.

When the application needs a bigger block than 8KB block (As I've observed, the large array generates a large block for the property hashmap in stress test), we can simply allocate multiple contiguous segment at once from system allocator.
The segment allocation actually adds more instructions in resolving compressed pointers. The compression and decompression are the most frequently invoked logics so that the segment allocation introduces performance problem. I'm optimizing comp/decomp logic to reduce the performance gap with original. Later, I'll put it with experimental results.
I've also come up with the idea about the statically pinned memory block in segment. The fragmentation from these memory blocks disturbs segment free behavior. According to the observation in several workloads (Actually no existance of standard IoT javaScript workload is also a problem), those pinned blocks are allocated at initial stage of the application. Therfore, we can mitigate the problem without compaction GC by arbitrating segment allocation and GC in out of memory situation.

Following experiments show that segment free is optimally invoked without compaction GC. First graph is the result after only adding segment allocation. Second graph is the result after optimization. The optimization concept is briefly "Do not free blocks allocated at initial stage since those blocks would be allocated again soon and pinned". The experimental result shows freeing lcache frees blocks at initial stage, I defer lcache-free GC (Higher severity GC) to the stage after initial stage.

I think to use custom allocator is not a good idea since only JerryScript uses the custom system allocator. This just migrates the problem into another layer in the system and generates more couplings in design. The segment allocator is not a perfect idea so far, but I believe we can optimize jerryScript heap memory management with minimum dependency.

zherczeg commented 7 years ago

I think to use custom allocator is not a good idea since only JerryScript uses the custom system allocator.

I don't understand this. The system has its own allocator which is optimized for that particular system. Hence instead of using two allocators (one for the system, another for jerry), we can simply use one. Why this introduce more couplings? Am I missing something?

yichoi commented 7 years ago

Interesting experiment! I think the cost would be maintanence and performance.

As @zherczeg said, it is true that maintaining large blocks separately and resolving compressed pointer would be non-trivial, harder to maintain. But it doesn't mean that it's impossible. Many of the existing memory allocators are already handling this issue successfully.

And about the third issue (not being able to release some of the heap segments with static object), in current architecture we cannot release the entire jerry heap. I think it is better to release at least some of the heap segments than never.

@esevan, I'm curious about the workload you experimented. Did you use IoT.js (and jerry snapshot) when implementing your amazon echo-like application?

esevan commented 7 years ago

@zherczeg Do you mean that we can modify existing system allocator for jerry? Jerry needs special interface which is not same as legacy system allocator (e.g. ptmalloc). If we modify the existing system allocator to provide the jerry-sepcific interfaces, then we can only use jerry in the system in which modified system allocator is installed, i.e. this is architecture-coupling. Otherwise, we can build a custom allocator on top of the existing allocator and deploy it with jerry. The problem is that the custom allocator should reserve contiguous memory only for jerry as current jerry does, i.e. same problem occurs in another layer.

@yichoi Yes, I did. I installed the mic servers and light server as TCP servers. The echo application (IoT.js) periodically sends TCP requests to those mic servers and monitors the room occupancy. By the occupancy, the application controls the light server. In case of Jerry snapshot for my modification, I used valgrind massif tool because my modification uses malloc() to dynamically allocate segments. Since the valgrind test is not available in raspberry-pi at time, I run the application on x86-64 server.

zherczeg commented 7 years ago

In theory you don't need to modify the system allocator. Just "remap" jerry_alloc to malloc and jerry_free to free. The only requirement is return with 8 byte aligned blocks for block size >= 8. This is done automatically by almost all allocators since double type requires it. You can also use posix_memalign if needed.

Anyway we can also add more allocator mechanisms to jerry, and let people to select the best at compile time. I think we can handle maintenance if it is reasonable.

zherczeg commented 7 years ago

Closing due to inactivity. Please reopen if needed.

jerryscript-project / jerryscript

On-demand heap memory management maintaining compressed pointer #1451