What happens to Old DataBlocks?

Drofzz commented 2 years ago

I was looking a little into the code on how DataBlocks works. But I kind of having trouble finding out if Old DataBlocks is getting repurposed after the Pointer is moved?

Let's say we insert DataBlock with a null pointer. And then insert another DataBlock on the previous DataBlocks pointer. We get a new Pointer? Correct? What happens to the old address on the file? Can/will it ever be repurposed? I will it just be empty bytes in the middle of the file?

Drofzz commented 2 years ago

additionally, and specific way to handle this problem? it seems that data blobs really can make files big really quick if I use Features using DataBlocks... like... DBreezeObjects

hhblaze commented 2 years ago

If pointer is changed after save, it can mean that block was moved, because current block is bigger than the previous. To have permanent pointer use DataBlockWithFixedAddress. The left space remains abandoned. There is no any FAT and current choice is "speed" vs "size".

Drofzz commented 2 years ago

If pointer is changed after save, it can mean that block was moved, because current block is bigger than the previous. To have permanent pointer use DataBlockWithFixedAddress. The left space remains abandoned. There is no any FAT and current choice is "speed" vs "size".

so what I am understanding from your answer is, that it is not impossible to reuse old dynamic blocks, but it requires some work?

would this idea work?

Track all blocks(ptr, used/not used, block size) in a NestedTable.
Every time I use InsertBlock, add/update old and new ptr with the "used/not used" state.
- if oldptr is set
- if oldptr != newptr after (var newptr = InsertDataBlock(table, oldptr, data))
  - Update oldptr to be "not used"
  - Insert newptr to be "used" AND size
- if oldptr is null
- check NestedTable for "unused" DataBlocks with same size
  - if found an unused DataBlock, use that oldptr for InsertDataBlock instead of null
  - if oldptr != newptr after (var newptr = InsertDataBlock(table, oldptr, data))
    - Update oldptr to be "not used"
    - Insert newptr to be "used" AND size
  - if no DataBlock is found, use null ptr
    - Insert newptr to be "used" AND size

additionally add "padding" in the data-byte array, 256bytes, 512bytes, 1024bytes, etc... so it is easier to find a match in the "not used" DataBlocks?

hhblaze commented 2 years ago

All works automatically already. First time you insert with null pointer, system will write on the new place DataBlock (db).size and db.payload, returning ptr. Next time you insert new db into the same ptr. System will compare old and new db sizes and will choose either to write on the same place (in case if newblock.size <= oldblock.size) or to create another pointer and to write new bigger block on the new place (in this case old space can be counted as a lost one).

Actually, db insert is done, because fulfills necessary minimal requirements, having the fact of the quite rare updates of the existing db with growing size on each iteration. If you assume e.g. that you are going to insert massively many times into one db and each blocks will reside more space than the previous one - you can trick, create on the first db insert a huger db.size than you need for that first insert and it can be filled up to the end on upcoming inserts.

If you like, you can create db.Insert clone and play with it as many as you like, let the idea show some visible benefits.

Drofzz commented 2 years ago

All works automatically already. First time you insert with null pointer, system will write on the new place DataBlock (db).size and db.payload, returning ptr. Next time you insert new db into the same ptr. System will compare old and new db sizes and will choose either to write on the same place (in case if newblock.size <= oldblock.size) or to create another pointer and to write new bigger block on the new place (in this case old space can be counted as a lost one).

Actually, db insert is done, because fulfills necessary minimal requirements, having the fact of the quite rare updates of the existing db with growing size on each iteration. If you assume e.g. that you are going to insert massively many times into one db and each blocks will reside more space than the previous one - you can trick, create on the first db insert a huger db.size than you need for that first insert and it can be filled up to the end on upcoming inserts.

If you like, you can create db.Insert clone and play with it as many as you like, let the idea show some visible benefits.

that is what i can understand from the code already, it seems the Pointer Value can be segmented into 2 parts.. 1. the address, 2. the size.

and as long as the data fits the Pointers Size Segment, the old Address can be used again.

but it also opens up to "Merge" and "Split" "not used" DataBlock addresses, by changing the Pointers Size Segment if tracked?

hhblaze commented 2 years ago

Splitting and Merging currently and obviously are not supported for the data blocks, for that writing protocol must be enhanced and backward compatibility must be hold. As I told, when interested - try this experiment.

Drofzz commented 2 years ago

thanks, I might try diving into it on the weekend, I have some ideas for an extension I might want to use in the future. thanks for making this library by the way.

hhblaze / DBreeze

What happens to Old DataBlocks? #71