Closed hustjieke closed 1 year ago
In core/column_share.h, defines DPN_SIZE
:
/*
The size of the current DPN structure object is used to prevent the structure of
the DPN from being changed arbitrarily.
If modification is required,
please consider the size of PAGT_CNT and COL_DN_FILE_SIZE to
prevent space waste and give consideration to IO efficiency
*/
static constexpr size_t DPN_SIZE = 88;
// make sure the struct is not modified by mistake
static_assert(sizeof(DPN) == DPN_SIZE, "Bad struct size of DPN");
I am investigating this matter and have a question: While I understand that the metadata DPN size 88 can be modified, I would like to know if the maximum number of rows per pack (64K) and the maximum number of rows for storage limit (8G) are also tunable variables. The reason for my inquiry is that I am exploring ways to maximize the reuse of pre-defined constant variables.
Good question!
Why the pack nums limits set to 64K? It comes from table meta, the struct named TABLE_META
:
struct TABLE_META {
uint64_t magic = common::FILE_MAGIC;
uint32_t ver = common::TABLE_DATA_VERSION;
uint32_t id;
uint32_t pss = 16;
};
The member named pss
is a short for pack size shift
, defaut we set value 16
, magic number, not friendly too, we can define a macro var here.
The pss
will transfer to column meta:
struct COL_META {
uint32_t magic;
uint32_t ver; // file version
uint8_t pss; // pack size shift
common::ColumnType type; // type
common::PackFmt fmt; // data format: LZ4, snappy, lookup, raw, etc
uint8_t flag;
uint32_t precision;
uint32_t scale;
};
pss
used by many places, you can search it in tianmu engine
:
core/tianmu_table.h:52: uint8_t pss;
core/table_share.cpp:39: if (meta.pss > common::MAX_PSS)
core/table_share.cpp:41: ": invalid pack size shift " + std::to_string(meta.pss));
core/tianmu_table.cpp:362: TABLE_META meta{common::FILE_MAGIC, common::TABLE_DATA_VERSION, tid, opt->pss};
core/tianmu_table.cpp:393: TianmuAttr::Create(lnk, opt->atis[idx], opt->pss, 0, auto_inc_value);
core/tianmu_table.cpp:454: TianmuAttr::Create(column_dir, Engine::GetAttrTypeInfo(*new_cols[i]), meta.pss, no_objs);
core/column_share.cpp:112: pss = meta.pss;
core/tianmu_attr_exqp.cpp:1185: static MIIterator const mit(nullptr, pss);
core/tianmu_attr_exqp.cpp:1318: static MIIterator const mit(nullptr, pss);
core/tianmu_attr_exqp.cpp:1392: static MIIterator const mit(nullptr, pss);
core/pack_str.cpp:49: data_.index = (char **)alloc(sizeof(char *) * (1 << col_share->pss), mm::BLOCK_TYPE::BLOCK_UNCOMPRESSED);
core/pack_str.cpp:50: data_.lens = alloc((data_.len_mode * (1 << col_share->pss)), mm::BLOCK_TYPE::BLOCK_UNCOMPRESSED);
core/pack_str.cpp:51: std::memset(data_.lens, 0, data_.len_mode * (1 << col_share->pss));
core/pack_str.cpp:68: data_.lens = alloc((data_.len_mode * (1 << col_share_->pss)), mm::BLOCK_TYPE::BLOCK_UNCOMPRESSED);
core/pack_str.cpp:69: std::memset(data_.lens, 0, data_.len_mode * (1 << col_share_->pss));
core/pack_str.cpp:71: reinterpret_cast<char **>(alloc(sizeof(char *) * (1 << col_share_->pss), mm::BLOCK_TYPE::BLOCK_UNCOMPRESSED));
core/pack_str.cpp:158: data_.lens = alloc((data_.len_mode * (1 << col_share_->pss)), mm::BLOCK_TYPE::BLOCK_UNCOMPRESSED);
core/pack_str.cpp:159: std::memset(data_.lens, 0, data_.len_mode * (1 << col_share_->pss));
core/pack_str.cpp:161: reinterpret_cast<char **>(alloc(sizeof(char *) * (1 << col_share_->pss), mm::BLOCK_TYPE::BLOCK_UNCOMPRESSED));
core/pack_str.cpp:348: mm::MMGuard<uint> nc_buffer((uint *)alloc((1 << col_share_->pss) * sizeof(uint32_t), mm::BLOCK_TYPE::BLOCK_TEMPORARY),
core/pack_str.cpp:523: dpn_->dataLength = bitmap_size_ * 2 + data_.sum_len + (data_.len_mode * (1 << col_share_->pss));
core/pack_str.cpp:533: dpn_->dataLength = bitmap_size_ * 2 + data_.sum_len + (data_.len_mode * (1 << col_share_->pss));
core/pack_str.cpp:557: f->WriteExact(data_.lens, (data_.len_mode * (1 << col_share_->pss)));
core/pack_str.cpp:633: mm::MMGuard<uint> cn_ptr((uint *)alloc((1 << col_share_->pss) * sizeof(uint), mm::BLOCK_TYPE::BLOCK_TEMPORARY),
core/pack_str.cpp:750: f->ReadExact(data_.lens, (data_.len_mode * (1 << col_share_->pss)));
core/pack_str.cpp:751: sz -= (data_.len_mode * (1 << col_share_->pss));
core/rsi_histogram.h:33: void Init(int64_t _no_obj, bool _fixed, int pss);
core/tianmu_attr.cpp:48: pss = m_share->pss;
core/tianmu_attr.cpp:70:void TianmuAttr::Create(const fs::path &dir, const AttributeTypeInfo &ati, uint8_t pss, size_t no_rows,
core/tianmu_attr.cpp:72: uint32_t no_pack = common::rows2packs(no_rows, pss);
core/tianmu_attr.cpp:78: pss, // pack size shift
core/tianmu_attr.cpp:137: dpn.numOfNulls = 1 << pss;
core/tianmu_attr.cpp:138: dpn.numOfRecords = 1 << pss;
core/tianmu_attr.cpp:147: auto left = no_rows % (1 << pss);
core/tianmu_attr.cpp:880: if (SizeOfPack() == 0 || get_last_dpn().numOfRecords == (1U << pss)) {
core/tianmu_attr.cpp:1321: FilterOnesIterator it(f, pss);
core/tianmu_attr.cpp:1356: FilterOnesIterator it(f, pss);
core/pack.cpp:32: bitmap_size_ = (1 << col_share->pss) / 8;
core/pack.cpp:76: return (dpn_->numOfRecords < (1U << col_share_->pss)) ||
core/column_share.h:37: uint8_t pss; // pack size shift
core/column_share.h:95: uint8_t pss;
core/tianmu_attr_exeq_rs.cpp:68: static MIIterator const mit(nullptr, pss);
core/tianmu_attr_exeq_rs.cpp:749: Filter values_present(span, pss);
core/rsi_cmap.cpp:48:void RSIndex_CMap::Create([[maybe_unused]] int64_t _no_obj, int no_pos, [[maybe_unused]] int pss) {
core/engine.cpp:775: opt->pss = power;
core/tianmu_attr.h:90: static void Create(const fs::path &path, const AttributeTypeInfo &ati, uint8_t pss, size_t no_rows,
core/tianmu_attr.h:221: uint32_t ValueOfPackPower() const { return pss; }
core/tianmu_attr.h:422: common::PACK_INDEX row2pack(size_t row) const { return row >> pss; }
core/tianmu_attr.h:423: int row2offset(size_t row) const { return row % (1L << pss); }
core/tianmu_attr.h:449: uint8_t pss;
core/rsi_cmap.h:43: void Create(int64_t _no_obj, int no_pos, int pss);
core/table_share.h:38: uint32_t pss = 16;
core/table_share.h:54: size_t PackSize() const { return 1 << meta.pss; }
core/table_share.h:55: uint8_t PackSizeShift() const { return meta.pss; }
64K = 1 << 16;
Currently we have a limit to MAX_PSS
= 16, defined in def.h
TableShare::TableShare(const fs::path &table_path, const TABLE_SHARE *table_share)
: no_cols(table_share->fields), table_path(table_path) {
system::TianmuFile ftbl;
ftbl.OpenReadOnly(table_path / common::TABLE_DESC_FILE);
ftbl.ReadExact(&meta, sizeof(meta));
if (meta.magic != common::FILE_MAGIC)
throw common::DatabaseException("Bad format of table definition in " + table_path.string() + ": bad signature!");
if (meta.ver != common::TABLE_DATA_VERSION)
throw common::DatabaseException("Bad format of table definition in " + table_path.string() + ": invalid version!");
if (meta.pss > common::MAX_PSS)
throw common::DatabaseException("Bad format of table definition in " + table_path.string() +
": invalid pack size shift " + std::to_string(meta.pss));
Thank you for explanation. It is very helpful. I am working on it, feel free to assign this issue to me.
One following up question. I understand your motivation of using sizeof(DPN)
to enhance readability and maintainability. However, there are two potential issues:
error: ‘TX_ID’ in namespace ‘Tianmu::common’ does not name a type
. My understanding is that is caused by a circled including. In here it is: "core/dpn.h" -> "common/common_definitions.h" -> "common/defs.h" -> "core/dpn.h". Even we can fix it for this file, it is still not ideal. So if we really want to have a single place for DPN's size, we may need to consider move static constexpr size_t DPN_SIZE = 88
from column_share.cpp to common. And the logic will be changed from "we ask dpn.h to give us DPN size value" to "we force dpn.cpp implementation to align with the value we set in common".Thank you for explanation. It is very helpful. I am working on it, feel free to assign this issue to me.
One following up question. I understand your motivation of using
sizeof(DPN)
to enhance readability and maintainability. However, there are two potential issues:
- Conceptually, common file should be all code shared by all other components. This means it should have minimum dependency on other classes. Having common file include a header file from core seems against pattern.
- In implementation, currently if I included "core/dpn.h", there was an error like
error: ‘TX_ID’ in namespace ‘Tianmu::common’ does not name a type
. My understanding is that is caused by a circled including. In here it is: "core/dpn.h" -> "common/common_definitions.h" -> "common/defs.h" -> "core/dpn.h". Even we can fix it for this file, it is still not ideal. So if we really want to have a single place for DPN's size, we may need to consider movestatic constexpr size_t DPN_SIZE = 88
from column_share.cpp to common. And the logic will be changed from "we ask dpn.h to give us DPN size value" to "we force dpn.cpp implementation to align with the value we set in common".
Make sense!
You are right, dpn.h
should not be included in common/defs.h
, I think it's better to keep it COL_DN_FILE_SIZE
into column_share.h or dpn.h ?
COL_DN_FILE_SIZE
used in three files:
$ grep -nrw "COL_DN_FILE_SIZE" *
core/column_share.cpp:35: please consider the size of PAGT_CNT and COL_DN_FILE_SIZE to
core/column_share.cpp:57: if (::munmap(start, common::COL_DN_FILE_SIZE) != 0) {
core/column_share.cpp:88: auto addr = ::mmap(0, common::COL_DN_FILE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, dn_fd, 0);
core/column_share.cpp:93: // TODO: should we mlock(addr, common::COL_DN_FILE_SIZE)?
core/column_share.cpp:233: ASSERT((capacity + DPN_INC_CNT) <= (common::COL_DN_FILE_SIZE / sizeof(DPN)),
core/column_share.cpp:263: int ret = ::msync(start, common::COL_DN_FILE_SIZE, MS_SYNC);
core/tianmu_attr.cpp:154: fs::resize_file(dir / common::COL_DN_FILE, common::COL_DN_FILE_SIZE);
core/column_share.h:80: ASSERT(i < common::COL_DN_FILE_SIZE / sizeof(DPN), "bad dpn index: " + std::to_string(i));
core/column_share.h:84: ASSERT(i < common::COL_DN_FILE_SIZE / sizeof(DPN), "bad dpn index: " + std::to_string(i));
Already assign to you @zzzz-vincent Looking forward to your pr!
Is your feature request related to a problem? Please describe.
In defs.h, I see some hard code that make me confused:
After analysis, I finally understand how it comes.
So the reasonable way to calculate
COL_DN_FILE_SIZE
should be:Struct
DPN
defines incore/dpn.h
.Describe the solution you'd like
Describe alternatives you've considered
Additional context