mikeizbicki / cmc-csci143

big data course materials
40 stars 76 forks source link

Denormalize Table Upload #485

Closed baron-zeng closed 7 months ago

baron-zeng commented 7 months ago

Hello,

For using the COPY command to load the denormalized JSON twitter data to postgres, I have the code below. Unfortunately, it does not seem to work. My best guess is that my unzip command is incorrect or incomplete. Does anyone have any tips on what to change or how I can test for errors?

echo 'load denormalized'
for file in $files; do
    unzip -p 'test-data.zip' | sed 's/\\u0000//g' | psql postgresql://postgres:pass@localhost:1333/postgres -c "COPY tweets_jsonb (data) FROM STDIN;"
done

I have changed my ports in docker-compose already:

    ports:
      - 1333:5432

Best, Baron

epaisano commented 7 months ago

Hello! First thing that stands out to me if that you want to iterate over all files, so replace 'test-data.zip' with "$file". I also decided to keep the csv stuff at the very end as well. So, my code looks like this:

unzip -p "$file" | sed 's/\\u0000//g' | psql postgresql://postgres:pass@localhost:62725 -c "COPY tweets_jsonb (data) FROM STDIN csv quote e'\x01' delimiter e'\x02';"

Hope this helps!

baron-zeng commented 7 months ago

Thank you! Unfortunately, there still seem to be some issues.

Here is my updated file (my port was updated in the docker-compose.yml file too):

files='
test-data.zip
'

echo 'load normalized'
for file in $files; do
    # call the load_tweets.py file to load data into pg_normalized
done

echo 'load denormalized'
for file in $files; do
    unzip -p "$file" | sed 's/\\u0000//g' | psql postgresql://postgres:pass@localhost:13335 -c "COPY tweets_jsonb (data) FROM STDIN csv quote e'\x01' delimiter e'\x02';"
done

To test if the denormalized data was loaded, I did the following:

docker-compose up -d --build
docker-compose exec pg_denormalized ./run_tests.sh

Did I miss any steps? Specifically, should I run the load_tweets.sh file?

Thank you! Baron