mikeizbicki / cmc-csci143

big data course materials
40 stars 76 forks source link

Incorrect file sizes #531

Open nati-azmera opened 7 months ago

nati-azmera commented 7 months ago

Hello,

After running nohup sh load_tweets_parallel.sh for about 3 hours, I get weird file sizes. I have also fixed the schema issues

docker-compose exec pg_denormalized sh -c 'du -hd0 $PGDATA 49G

docker-compose exec pg_normalized_batch sh -c 'du -hd0 $PGDATA 49G

Does anyone know why or how I could fix it?

mikeizbicki commented 6 months ago

It looks like you have probably inserted data twice into the pg_denormalized database, and your insert into pg_normalized_batch was interrupted for some reason.

The most correct thing to do is to restart from scratch: delete you existing database, and reinsert the data. As this will take a long time, however, I will waive for you the requirement that the pg_normalized_batch test cases pass, so you can begin working on the CREATE INDEX commands. I can't do that for pg_denormalized, however, because you'll need the test cases to know if the SQL SELECT statements you've written are correct. For that database, you will have to delete everything and start over.