icenet-ai / icenet-etl

Infrastructure for storing IceNet predictions and importing them into a database
MIT License
1 stars 1 forks source link

Database size #14

Open jemrobinson opened 2 years ago

jemrobinson commented 2 years ago

Records

In the forecast tables we expect a single record to take:

Disk size

However, from recent measurements:

Number of records Size in bytes
9100000 1032159232
9200000 1040728064

so 100000 records take 8568832 bytes => each record takes 85.68 bytes

Summary

There are around 23M records each day for the northern and southern hemispheres combined. This means: 23,000,000 * 85.68 / 1024 / 1024 = 1.84 GB per day.

jemrobinson commented 2 years ago

The following (from Slack) might explain why the current number of records per-day is so high.

James Robinson 2022-02-09 21:36

I’ve noticed that the latest predictions show non-zero sea ice (sic_mean > 0) in every cell of the 432 x 432 grid for every date and leadtime. This feels incorrect, as i’m pretty sure that some of that space is land - can you confirm @James Byrne

James Byrne 2022-02-09 22:21

I've not yet been applying the land mask to the outputs, which I do need to do. The predictions in the south are vaguely sensible, the north might be very ropey! Good spot though, I'll sort that out tomorrow! :wink:

jemrobinson commented 2 years ago

Recent files show 8286021 records per day for the northern hemisphere and 3094203 for the southern. It looks like there's still an issue with the masking for the northern hemisphere though (see below) so these numbers may come down further.

Screenshot 2022-02-14 at 00 03 45
jemrobinson commented 2 years ago

Since 2022-02-16 the sizes are

hemisphere n_records est size (MB)
north 9070011 741.1
south 14261829 1165.5