After learning about indexes, I understood their basic structure, but I wanted to dig deeper — to explore the data structure, understand the algorithm, and learn how the index data is stored on disk.
The theory and actual implementation can differ, so I decided to explore this topic further.
I wanted to see how a database management system (DBMS) stores an index in both disk and memory, and how it searches through an Index.
I chose SQLite for my experiments:
it’s a widely used DBMS, found in browsers, mobile apps, and operating systems;
it's easier to debug: no separate server, just a client-side application;
its codebase is smaller than MySQL or PostgreSQL but uses similar data structures for Indexes;
According to SQLite documentation, Indexes are stored in a B-Tree structure, which is a balanced tree where each node has multiple children.
It typically looks like this:
To understand how SQLite stores Nodes, let’s look at the Page and Cell structures.
A Page (analog of a Node on SQLite) stores Cells data and has a link to its right child Page.
A Cell contains Index data, a rowId, and a link to its left child Page.
By default, each SQLite table row has a unique rowId, which works like a primary key if one isn’t explicitly defined.
Here’s a visual example of a B-Tree Index in SQLite:
Index data is stored on disk in this structure:
Each Page has a fixed size, ranging from 512 to 65,536 bytes. Page and Cell headers use 4 bytes to store child links.
If we want to know child Page number - we need to read the header separately with this function:
get4byte(...)
For other Page and Cell data, we can use these C structures:
sqlite3_analyzer database.sqlite
...
Page size in bytes................................ 4096
...
*** Index IDX of table TABLE_TEST *********************************************
Number of entries................................. 1000
B-tree depth...................................... 2
Total pages used.................................. 4
..
This tool provides only general information about index.
Great!
The next step was to display everything visually — an easy part of the process.
I found a library called d3-org-tree for visualizing index structures.
Here’s how it looked in the early stages:
However, there was a problem: I couldn’t adjust the spacing between Pages, so as the tree became deeper and more Pages were added at each level, the image became too large and hard to read.
I tried adjusting it with JavaScript and CSS, but it didn’t work well.
After a few tries with d3-org-tree, I decided that using text to display the structure would be simpler.
Not bad, but I could go further.
PHP's ImageMagick extension lets us create images with more control over design and spacing than text alone. After about a dozen tries, here's the final version I came up with:
The image now includes all the needed data and is easy to read.
In the top-left corner, there’s general information about the Index.
Each level shows the total number of Pages and Cells.
Each Page shows its Page number, the link to its right child, and details about the first and last Cell.
Only a few Pages are shown per level, including the first and last Pages for each level.
The root Page is located at the first level.
Use this command to generate an image from the dump
We can create different data for the Indexes and explore what's inside them.
To start, it would be interesting to see how the Index size grows from 1 to 1,000,000 records.
Before each Index image, I'll show the table's data structure, the way the Index was made, and how the table was filled with data.
CREATETABLE table_test (column1 INTNOTNULL);
INSERTINTO table_test (column1) VALUES (1),(2),(3),...,(999998),(999999),(1000000);
CREATE INDEX idx ON table_test (column1 ASC);
Now we’ve reached the image I used earlier as an example.
This Index has 3 levels, 2,930 Pages, and 1,000,000 Cells. The data was added in order, so for rowId = 1, column1 = 1.
Now, let's add two Indexes with different sort directions.
CREATETABLE table_test (column1 INTNOTNULL);
INSERTINTO table_test (column1) VALUES (1),(2),(3),...,(999998),(999999),(1000000);
CREATE INDEX idx_asc ON table_test (column1 ASC);
CREATE INDEX idx_desc ON table_test (column1 DESC);
The ASC Index is the same as above, as ASC sorting is used by default.
The table's first entry, rowId=1,000,000, column1=1,000,000, payload=1,000,000, is in the last Cell of the rightmost Page.
The table's last entry, rowId=1, column1=1, payload=1, is in the first Cell of the leftmost Page.
The DESC Index is reversed.
The table's first entry, rowId=1, column1=1, payload=1, is in the last Cell of the rightmost Page.
The table's last entry, rowId=1,000,000, column1=1,000,000, payload=1,000,000, is in the first Cell of the leftmost Page.
The tree must rebalance itself when new data is added. Creating an Index on existing data should be much more efficient.
Both Indexes look similar, but the second Index, with fewer Pages, should be faster.
+--------+-------------+-------------+
| | Total Pages | Total Cells |
+--------+-------------+-------------+
| Before | 3342 | 1000000 |
| After | 2930 | 1000000 |
+--------+-------------+-------------+
Based on the work done, we saw how Indexes in SQLite are structured.
We looked at how record data is stored in memory and how the B-Tree organizes and accesses this data.
The visualization helped analyze and compare different Indexes.
To reproduce all of these examples, you can run the following:
docker run -it --rm -v "$PWD":/app/data --platform linux/x86_64 mrsuh/sqlite-index bash
sh bin/test-index.sh
SQLite Index Visualization: Structure
https://ift.tt/XDwyeVF
Anton Sukhachev
After learning about indexes, I understood their basic structure, but I wanted to dig deeper — to explore the data structure, understand the algorithm, and learn how the index data is stored on disk.
The theory and actual implementation can differ, so I decided to explore this topic further.
I wanted to see how a database management system (DBMS) stores an index in both disk and memory, and how it searches through an Index.
I chose SQLite for my experiments:
Node and Page Structure
According to SQLite documentation, Indexes are stored in a B-Tree structure, which is a balanced tree where each node has multiple children.
It typically looks like this:
To understand how SQLite stores Nodes, let’s look at the Page and Cell structures.
A Page (analog of a Node on SQLite) stores Cells data and has a link to its right child Page.
A Cell contains Index data, a rowId, and a link to its left child Page.
By default, each SQLite table row has a unique rowId, which works like a primary key if one isn’t explicitly defined.
Here’s a visual example of a B-Tree Index in SQLite:
Index data is stored on disk in this structure:
Each Page has a fixed size, ranging from 512 to 65,536 bytes. Page and Cell headers use 4 bytes to store child links.
If we want to know child Page number - we need to read the header separately with this function:
For other Page and Cell data, we can use these C structures:
Page
sqlite/src/btreeInt.h
Cell
sqlite/src/btreeInt.h
To view index data, we can use sqlite3 analyzer:
This tool provides only general information about index.
Analyzing SQLite Source Code
After a few weeks of experimenting, I wrote my functions for index analysis.
You can view the code here:
The function reads the content of selected index and outputting data to STDOUT:
Here’s an example output:
I packed everything into a docker if you want to test it:
You can use the script like this:
dump.txt
Great!
The next step was to display everything visually — an easy part of the process.
I found a library called d3-org-tree for visualizing index structures.
Here’s how it looked in the early stages:
However, there was a problem: I couldn’t adjust the spacing between Pages, so as the tree became deeper and more Pages were added at each level, the image became too large and hard to read.
I tried adjusting it with JavaScript and CSS, but it didn’t work well.
After a few tries with d3-org-tree, I decided that using text to display the structure would be simpler.
Example:
Not bad, but I could go further.
PHP's ImageMagick extension lets us create images with more control over design and spacing than text alone. After about a dozen tries, here's the final version I came up with:
The image now includes all the needed data and is easy to read.
In the top-left corner, there’s general information about the Index.
Each level shows the total number of Pages and Cells.
Each Page shows its Page number, the link to its right child, and details about the first and last Cell.
Only a few Pages are shown per level, including the first and last Pages for each level.
The root Page is located at the first level.
Use this command to generate an image from the dump
Now it's time to experiment!
We can create different data for the Indexes and explore what's inside them.
To start, it would be interesting to see how the Index size grows from 1 to 1,000,000 records.
Before each Index image, I'll show the table's data structure, the way the Index was made, and how the table was filled with data.
Index with 1 record
One level, one Page, one Cell. Simple!
Index with 1000 records
Index with 1.000.000 records
Now we’ve reached the image I used earlier as an example.
This Index has 3 levels, 2,930 Pages, and 1,000,000 Cells. The data was added in order, so for rowId = 1, column1 = 1.
Comparing ASC and DESC Indexes
Now, let's add two Indexes with different sort directions.
The ASC Index is the same as above, as ASC sorting is used by default.
The table's first entry, rowId=1,000,000, column1=1,000,000, payload=1,000,000, is in the last Cell of the rightmost Page.
The table's last entry, rowId=1, column1=1, payload=1, is in the first Cell of the leftmost Page.
The DESC Index is reversed.
The table's first entry, rowId=1, column1=1, payload=1, is in the last Cell of the rightmost Page.
The table's last entry, rowId=1,000,000, column1=1,000,000, payload=1,000,000, is in the first Cell of the leftmost Page.
Index with expression-based data
The Index now stores a string generated by the expression.
You can use more complex expressions, and the Index will save the only result.
Unique Index with NULL values
SQLite supports unique Indexes with NULL values.
This index looks like we are storing only non-NULL values.
Filtering NULL Values with Partial Indexes
The Index now contains just one Page, leading to faster searches than the previous example.
Multi-Column Index
As we can see, the data for all fields in a cell are stored one after another.
The fields are separated visually with a colon
:
.Comparing Indexes Created Before and After Data Population
Before
After
The tree must rebalance itself when new data is added. Creating an Index on existing data should be much more efficient.
Both Indexes look similar, but the second Index, with fewer Pages, should be faster.
VACUUM and REINDEX
To achieve similar optimization, we can rebuild an existing Index with these commands:
VACUUM recreates Indexes and tables with data:
REINDEX - recreates Indexes only:
After running VACUUM/REINDEX, the number of Pages in the Index decreased a lot.
Text Data in Indexes
Let's look at how text is stored. Short strings are saved directly in the Index Cells, but longer text must be stored separately.
You can easily see the actual string stored directly in the Index.
Float-point Data in Indexes
Combining integer and text in a single Index:
The integer and string are stored together in the Cell, just as we specified when creating the Index.
Conclusion
Based on the work done, we saw how Indexes in SQLite are structured.
We looked at how record data is stored in memory and how the B-Tree organizes and accesses this data.
The visualization helped analyze and compare different Indexes.
To reproduce all of these examples, you can run the following:
Code and examples are available here
Next, I'll focus on visualizing Index-based searches and explore some interesting SQL queries.
via mrsuh.com https://mrsuh.com
November 15, 2024 at 06:36PM