Open broccoliSpicy opened 3 days ago
Attention: Patch coverage is 19.14894%
with 266 lines
in your changes missing coverage. Please review.
Project coverage is 77.69%. Comparing base (
1d3b204
) to head (0a6f6c9
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
🚨 Try these New Features:
after make append method in DataBlockBuilderImpl use immutable borrow: | table name | Parquet Read Time | Lance Read Time | Parquet File Size | Lance File Size |
---|---|---|---|---|---|
customer | 0.45s | 0.42s | 120 MiB | 152 MiB | |
lineitem | 11.55s | 10.46s | 2,076 MiB | 3,374 MiB | |
orders | 2.93s | 2.78s | 559 MiB | 894 MiB | |
part | 0.34s | 0.40s | 61 MiB | 127 MiB | |
partsupp | 3.29s | 1.89s | 401 MiB | 470 MiB |
Column Name | DataType | Parquet Read Time | Lance Read Time | Parquet File Size | Lance File Size | Cardinality |
---|---|---|---|---|---|---|
p_partkey | int32 | 0.05s | 0.02s | 8 MiB | 4 MiB | 2000000 |
p_name | string | 0.45s | 0.09s | 25 MiB | 30 MiB | 1999828 |
p_mfgr | string | 0.07s | 0.03s | 0.7 MiB | 0.7 MiB | 5 |
p_brand | string | 0.06s | 0.03s | 1 MiB | 1 MiB | 25 |
p_type | string | 0.10s | 0.23s | 1 MiB | 38 MiB | 150 |
p_size | int32 | 0.02s | 0.02s | 1 MiB | 1 MiB | 50 |
p_container | string | 0.08s | 0.03s | 1 MiB | 1 MiB | 40 |
p_retailprice | decimal128(15, 2) | 0.11s | 0.20 s | 3 MiB | 30 MiB | 31681 |
p_comment | string | 0.32s | 0.19s | 16 MiB | 27 MiB | 754704 |
First 100 rows: l_extendedprice [[Decimal('33078.94')] [Decimal('38306.16')] [Decimal('15479.68')] [Decimal('34616.68')] [Decimal('28974.00')] [Decimal('44842.88')] [Decimal('63066.32')] [Decimal('86083.65')] [Decimal('70822.15')] [Decimal('39620.34')] [Decimal('3581.56')] [Decimal('52411.80')] [Decimal('35032.14')] [Decimal('39819.00')] [Decimal('25179.60')] [Decimal('31387.20')] [Decimal('68864.50')] [Decimal('53697.73')] [Decimal('17273.04')] [Decimal('12423.15')] [Decimal('84904.50')] [Decimal('46245.92')] [Decimal('74398.68')] [Decimal('55806.45')] [Decimal('7216.50')] [Decimal('26963.72')] [Decimal('40995.52')] [Decimal('3091.16')] [Decimal('5393.68')] [Decimal('46642.64')] [Decimal('6978.84')] [Decimal('39224.92')] [Decimal('34948.80')] [Decimal('8803.10')] [Decimal('49780.56')] [Decimal('20768.41')] [Decimal('24817.98')] [Decimal('8558.10')] [Decimal('33708.00')] [Decimal('44788.54')] [Decimal('13026.23')] [Decimal('42317.50')] [Decimal('42877.74')] [Decimal('45516.80')] [Decimal('74029.62')] [Decimal('48691.20')] [Decimal('69449.25')] [Decimal('45538.29')] [Decimal('63681.20')] [Decimal('49288.36')] [Decimal('46194.72')] [Decimal('58892.42')] [Decimal('57788.48')] [Decimal('52982.88')] [Decimal('68665.20')] [Decimal('30837.66')] [Decimal('52933.66')] [Decimal('26050.42')] [Decimal('37545.27')] [Decimal('37916.72')] [Decimal('78670.80')] [Decimal('5069.36')] [Decimal('21910.92')] [Decimal('10159.55')] [Decimal('48887.96')] [Decimal('23784.30')] [Decimal('33001.13')] [Decimal('4925.01')] [Decimal('84764.66')] [Decimal('84721.88')] [Decimal('26424.60')] [Decimal('40541.31')] [Decimal('46006.50')] [Decimal('63853.40')] [Decimal('54433.44')] [Decimal('55447.68')] [Decimal('29539.20')] [Decimal('3279.00')] [Decimal('72225.30')] [Decimal('25852.69')] [Decimal('9761.92')] [Decimal('20974.98')] [Decimal('1186.00')] [Decimal('14182.41')] [Decimal('50996.73')] [Decimal('30371.88')] [Decimal('30631.75')] [Decimal('3330.36')] [Decimal('61348.50')] [Decimal('49876.20')] [Decimal('57583.11')] [Decimal('47574.50')] [Decimal('38862.87')] [Decimal('58554.90')] [Decimal('24241.36')] [Decimal('61777.05')] [Decimal('39272.24')] [Decimal('29739.92')] [Decimal('1424.37')] [Decimal('14056.42')]]
first 100 rows p_retailprice [[Decimal('901.00')] [Decimal('902.00')] [Decimal('903.00')] [Decimal('904.00')] [Decimal('905.00')] [Decimal('906.00')] [Decimal('907.00')] [Decimal('908.00')] [Decimal('909.00')] [Decimal('910.01')] [Decimal('911.01')] [Decimal('912.01')] [Decimal('913.01')] [Decimal('914.01')] [Decimal('915.01')] [Decimal('916.01')] [Decimal('917.01')] [Decimal('918.01')] [Decimal('919.01')] [Decimal('920.02')] [Decimal('921.02')] [Decimal('922.02')] [Decimal('923.02')] [Decimal('924.02')] [Decimal('925.02')] [Decimal('926.02')] [Decimal('927.02')] [Decimal('928.02')] [Decimal('929.02')] [Decimal('930.03')] [Decimal('931.03')] [Decimal('932.03')] [Decimal('933.03')] [Decimal('934.03')] [Decimal('935.03')] [Decimal('936.03')] [Decimal('937.03')] [Decimal('938.03')] [Decimal('939.03')] [Decimal('940.04')] [Decimal('941.04')] [Decimal('942.04')] [Decimal('943.04')] [Decimal('944.04')] [Decimal('945.04')] [Decimal('946.04')] [Decimal('947.04')] [Decimal('948.04')] [Decimal('949.04')] [Decimal('950.05')] [Decimal('951.05')] [Decimal('952.05')] [Decimal('953.05')] [Decimal('954.05')] [Decimal('955.05')] [Decimal('956.05')] [Decimal('957.05')] [Decimal('958.05')] [Decimal('959.05')] [Decimal('960.06')] [Decimal('961.06')] [Decimal('962.06')] [Decimal('963.06')] [Decimal('964.06')] [Decimal('965.06')] [Decimal('966.06')] [Decimal('967.06')] [Decimal('968.06')] [Decimal('969.06')] [Decimal('970.07')] [Decimal('971.07')] [Decimal('972.07')] [Decimal('973.07')] [Decimal('974.07')] [Decimal('975.07')] [Decimal('976.07')] [Decimal('977.07')] [Decimal('978.07')] [Decimal('979.07')] [Decimal('980.08')] [Decimal('981.08')] [Decimal('982.08')] [Decimal('983.08')] [Decimal('984.08')] [Decimal('985.08')] [Decimal('986.08')] [Decimal('987.08')] [Decimal('988.08')] [Decimal('989.08')] [Decimal('990.09')] [Decimal('991.09')] [Decimal('992.09')] [Decimal('993.09')] [Decimal('994.09')] [Decimal('995.09')] [Decimal('996.09')] [Decimal('997.09')] [Decimal('998.09')] [Decimal('999.09')] [Decimal('1000.10')]]
This PR tries to support dictionary encoding by integrating it with
MiniBlock PageLayout
.The general approach here is: In a
MiniBlock PageLayout
, there is a optionaldictionary field
that stores a dictionary encoding if thisminiblock
has a dictionary.The rational for this is that if we dictionary encoding something, it's indices will definitely fall into a
MiniBlockLayout
. By doing this, we don't need to have a specificDictionaryEncoding
, it can be anyArrayEncoding
. TheDictionary
and theindices
are cascaded into another encoding automatically.Currently, the dictionary is stored inside the page along with
chunk meta data
andchunk data
, this is not ideal and is aTODO
task.This is a draft for discussion with the above idea so I only supported
FixedWidthDataBlock
with this encoding, the effort to add support forVariableWidthData
is trivial.3123
cache_bytes_per_column = 8 * 1024 * 1024
:cache_bytes_per_column = 32 * 1024 * 1024
:to reproduce, here are the test scripts: