Closed lucernae closed 4 years ago
Query used:
SELECT
relname AS "relation",
pg_size_pretty(
pg_relation_size(C .oid)
) as "rel_size",
pg_size_pretty (
pg_total_relation_size (C .oid)
) AS "total_size"
FROM
pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C .relnamespace)
WHERE
nspname NOT IN (
'pg_catalog',
'information_schema'
)
AND C .relkind <> 'i'
AND nspname !~ '^pg_toast'
ORDER BY
pg_total_relation_size (C .oid) DESC
LIMIT 5;
It should return top 5 tables with the most space.
osm_buildings
have 22GB
Query used:
SELECT l.metric, l.nr AS "bytes/ct"
, CASE WHEN is_size THEN pg_size_pretty(nr) END AS bytes_pretty
, CASE WHEN is_size THEN nr / NULLIF(x.ct, 0) END AS bytes_per_row
FROM (
SELECT min(tableoid) AS tbl -- = 'public.tbl'::regclass::oid
, count(*) AS ct
, sum(length(t::text)) AS txt_len -- length in characters
FROM public.osm_buildings t -- provide table name *once*
) x
, LATERAL (
VALUES
(true , 'core_relation_size' , pg_relation_size(tbl))
, (true , 'visibility_map' , pg_relation_size(tbl, 'vm'))
, (true , 'free_space_map' , pg_relation_size(tbl, 'fsm'))
, (true , 'table_size_incl_toast' , pg_table_size(tbl))
, (true , 'indexes_size' , pg_indexes_size(tbl))
, (true , 'total_size_incl_toast_and_indexes', pg_total_relation_size(tbl))
, (true , 'live_rows_in_text_representation' , txt_len)
, (false, '------------------------------' , NULL)
, (false, 'row_count' , ct)
, (false, 'live_tuples' , pg_stat_get_live_tuples(tbl))
, (false, 'dead_tuples' , pg_stat_get_dead_tuples(tbl))
) l(is_size, metric, nr);
Result:
metric | bytes/ct | bytes_pretty | bytes_per_row |
---|---|---|---|
core_relation_size | 17544536064 | 16 GB | 3094 |
visibility_map | 540672 | 528 kB | 0 |
free_space_map | 4333568 | 4232 kB | 0 |
table_size_incl_toast | 17549819904 | 16 GB | 3095 |
indexes_size | 5588377600 | 5329 MB | 985 |
total_size_incl_toast_and_indexes | 23138197504 | 22 GB | 4080 |
live_rows_in_text_representation | 2193423403 | 2092 MB | 386 |
------------------------------ | NULL | NULL | NULL |
row_count | 5669884 | NULL | NULL |
live_tuples | 56898224 | NULL | NULL |
dead_tuples | 0 | NULL | NULL |
in csv:
metric,bytes/ct,bytes_pretty,bytes_per_row
core_relation_size,17544536064,16 GB,3094
visibility_map,540672,528 kB,0
free_space_map,4333568,4232 kB,0
table_size_incl_toast,17549819904,16 GB,3095
indexes_size,5588377600,5329 MB,985
total_size_incl_toast_and_indexes,23138197504,22 GB,4080
live_rows_in_text_representation,2193423403,2092 MB,386
------------------------------,,,
row_count,5669884,,
live_tuples,56898224,,
dead_tuples,0,,
Query used for total size in row (using certain row id):
SELECT sum(pg_column_size(t.*)) as filesize FROM (select * from osm_buildings where id = 4352) as t;
Result: | filesize |
---|---|
8416 |
Query used to get all columns name:
select column_name from information_schema.columns where table_name = 'osm_buildings'
Result: | column_name |
---|---|
id | |
osm_id | |
name | |
leisure | |
height | |
building:levels | |
building:height | |
building:min_level | |
roof:height | |
roof:material | |
building:material | |
use | |
religion | |
type | |
amenity | |
landuse | |
geometry | |
building_type | |
building_type_score | |
building_area | |
building_area_score | |
building_material_score | |
building_road_length | |
building_road_density_score | |
total_vulnerability | |
village_id | |
sub_district_id | |
district_id | |
building_road_density | |
building_id |
Query used to get all column size (for a certain row id):
select
pg_column_size(id) as id,
pg_column_size(osm_id) as osm_id,
pg_column_size(name) as name,
pg_column_size(leisure) as leisure,
pg_column_size(height) as height,
pg_column_size("building:levels") as "building:levels",
pg_column_size("building:height") as "building:height",
pg_column_size("building:min_level") as "building:min_level",
pg_column_size("roof:height") as "roof:height",
pg_column_size("roof:material") as "roof:material",
pg_column_size("building:material") as "building:material",
pg_column_size(use) as use,
pg_column_size(religion) as religion,
pg_column_size(type) as type,
pg_column_size(amenity) as amenity,
pg_column_size(landuse) as landuse,
pg_column_size(geometry) as geometry,
pg_column_size(building_type) as building_type,
pg_column_size(building_type_score) as building_type_score,
pg_column_size(building_area) as building_area,
pg_column_size(building_area_score) as building_area_score,
pg_column_size(building_material_score) as building_material_score,
pg_column_size(building_road_length) as building_road_length,
pg_column_size(building_road_density_score) as building_road_density_score,
pg_column_size(total_vulnerability) as total_vulnerability,
pg_column_size(village_id) as village_id,
pg_column_size(sub_district_id) as sub_district_id,
pg_column_size(district_id) as district_id,
pg_column_size(building_road_density) as building_road_density,
pg_column_size(building_id) as building_id
from osm_buildings where id = 4352;
Result: | id | osm_id | name | leisure | height | building:levels | building:height | building:min_level | roof:height | roof:material | building:material | use | religion | type | amenity | landuse | geometry | building_type | building_type_score | building_area | building_area_score | building_material_score | building_road_length | building_road_density_score | total_vulnerability | village_id | sub_district_id | district_id | building_road_density | building_id |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
4 | 8 | 17 | 1 | NULL | 1 | NULL | NULL | NULL | 1 | 1 | 1 | 1 | 12 | 12 | 12 | 8244 | 12 | 8 | 8 | 8 | 8 | NULL | NULL | 8 | NULL | NULL | NULL | NULL | NULL |
In this particular case, we know that the bulk size comes from geometry. How much of it in total?
Query total geometry size:
select
total_geometry_size, total_row_size - total_geometry_size as total_non_geometry_size, total_row_size,
pg_size_pretty(total_geometry_size) as pretty_total_geometry_size,
pg_size_pretty(total_row_size - total_geometry_size) as pretty_total_non_geometry_size,
pg_size_pretty(total_row_size) as pretty_total_row_size
from
(select sum(pg_column_size(t.*)) as total_row_size, sum(pg_column_size(geometry)) as total_geometry_size from osm_buildings t) as a;
Result:
total_geometry_size | total_non_geometry_size | total_row_size | pretty_total_geometry_size | pretty_total_non_geometry_size | pretty_total_row_size |
---|---|---|---|---|---|
773983117 | 708148139 | 1482131256 | 738 MB | 675 MB | 1413 MB |
The total of only column sizes does not match with total row size if we use pg_relation_size
. There must be some columns or indices that I missed. I don't know where else to look
@lucernae Where is the sql for the vulnerability calculations. I can try to run on my local machie and see if the problem persist
@NyakudyaA we can run it, it's just that the size blew up after vulnerability calculations
Fresh dump then restore the same db will output this:
Query used:
SELECT
relname AS "relation",
pg_size_pretty(
pg_relation_size(C .oid)
) as "rel_size",
pg_size_pretty (
pg_total_relation_size (C .oid)
) AS "total_size"
FROM
pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C .relnamespace)
WHERE
nspname NOT IN (
'pg_catalog',
'information_schema'
)
AND C .relkind <> 'i'
AND nspname !~ '^pg_toast'
ORDER BY
pg_total_relation_size (C .oid) DESC
LIMIT 5;
Result:
relation | rel_size | total_size
-----------------+----------+------------
osm_buildings | 1491 MB | 1995 MB
osm_roads | 448 MB | 591 MB
osm_admin | 10 MB | 72 MB
osm_waterways | 54 MB | 69 MB
spatial_ref_sys | 4584 kB | 4808 kB
(5 rows)
After vulnerability calculations, same query, same data (the vulnerability will have the same output as before):
osm_buildings
beocmes 22GB.
So, total_row_size is accurate after all, I just don't know the extra size is for what?
I need to write some procedure on how to reproduce the problem first @NyakudyaA
Concluded that the cause is mostly pg_wal
directory.
When we backup and restore the same data, the overall size becomes much smaller.
Concluded that the cause is mostly
pg_wal
directory. When we backup and restore the same data, the overall size becomes much smaller.
This is caused by the env variable --wal-segsize=${WAL_SEGSIZE}
It determines how much detail each wal log contains.
You also need to tune PG_WAL_KEEP_SEGMENTS
to a low value for the wal directory to not store much data
Question
Why central-america.osm.pbf data ends up around 25 GB after running vulnerability calculations?