mikeizbicki / cmc-csci143

big data course materials
40 stars 76 forks source link

File sizes not exactly matching instructions #509

Closed ains-arch closed 7 months ago

ains-arch commented 7 months ago

How concerned should I be if my pg_denormalized $PGDATA disk usage is 74G instead of 75G? Thinking this might be from not deleting entirely everything when running the rm -rf commands because the data folder was still in use by postgres. Here are the outputs after the pg_denormalized section has run:

$  nohup ./load_tweets_parallel.sh > output.log 2>&1 &
$  cat output.log
================================================================================
load pg_denormalized
================================================================================
COPY 2979992
COPY 3044365
COPY 3038917
COPY 3143286
COPY 3189325
COPY 3129896
COPY 3157691
COPY 3148130
COPY 3306556
COPY 3376266
1749.46user 438.53system 19:37.84elapsed 185%CPU (0avgtext+0avgdata 17848maxresident)k
24inputs+29440outputs (0major+65005minor)pagefaults 0swaps
...
$ docker-compose exec pg_denormalized sh -c 'du -hd0 $PGDATA'
...
  from cryptography.hazmat.backends import default_backend
74G     /var/lib/postgresql/data
abizermamnoon commented 7 months ago

I had 74G too! Mike told me it is fine