mikeizbicki / cmc-csci143

big data course materials
40 stars 76 forks source link

Step 1 and 2 Postgres Indexes Building Container Question #520

Closed sjanefullerton closed 7 months ago

sjanefullerton commented 7 months ago

Hi, I am having issues with getting my pg_normalized_batch test cases to pass when I start step 2 after copying over the files from the previous homework. I was wondering if it had anything to do with my current docker containers running?

When I build and bring up my containers, everything seems to work. This is what I get: Building the containers:

lambda-server:~/twitter_postgres_indexes (master *=) $ docker-compose build
/home/sfullerton24/.local/lib/python3.6/site-packages/paramiko/transport.py:32: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography. The next release of cryptography will remove support for Python 3.6.
  from cryptography.hazmat.backends import default_backend
Building pg_denormalized
[+] Building 0.1s (10/10) FINISHED                                  
 => [internal] load build definition from Dockerfile           0.0s
 => => transferring dockerfile: 342B                           0.0s
 => [internal] load metadata for docker.io/library/postgres:1  0.0s
 => [internal] load .dockerignore                              0.0s
 => => transferring context: 2B                                0.0s
 => [1/5] FROM docker.io/library/postgres:13                   0.0s
 => [internal] load build context                              0.0s
 => => transferring context: 32B                               0.0s
 => CACHED [2/5] RUN apt-get update && apt-get install -y      0.0s
 => CACHED [3/5] WORKDIR /tmp/db                               0.0s
 => CACHED [4/5] RUN mkdir /data && chown postgres /data       0.0s
 => CACHED [5/5] COPY schema.sql /docker-entrypoint-initdb.d/  0.0s
 => exporting to image                                         0.0s
 => => exporting layers                                        0.0s
 => => writing image sha256:27725050f7162bab5e16ac3d30d7909cc  0.0s
 => => naming to docker.io/library/twitter_postgres_indexes_p  0.0s
Building pg_normalized_batch
[+] Building 0.8s (10/10) FINISHED                                  
 => [internal] load build definition from Dockerfile           0.0s
 => => transferring dockerfile: 346B                           0.0s
 => [internal] load metadata for docker.io/postgis/postgis:la  0.6s
 => [internal] load .dockerignore                              0.0s
 => => transferring context: 2B                                0.0s
 => [1/5] FROM docker.io/postgis/postgis:latest@sha256:01f46e  0.0s
 => [internal] load build context                              0.0s
 => => transferring context: 32B                               0.0s
 => CACHED [2/5] RUN apt-get update && apt-get install -y      0.0s
 => CACHED [3/5] WORKDIR /tmp/db                               0.0s
 => CACHED [4/5] RUN mkdir /data && chown postgres /data       0.0s
 => CACHED [5/5] COPY schema.sql /docker-entrypoint-initdb.d/  0.0s
 => exporting to image                                         0.0s
 => => exporting layers                                        0.0s
 => => writing image sha256:caf3e4fc928b49c972dd3d3bf704098f8  0.0s
 => => naming to docker.io/library/twitter_postgres_indexes_p  0.0s

Bringing up the containers:

lambda-server:~/twitter_postgres_indexes (master *=) $ docker-compose up -d
/home/sfullerton24/.local/lib/python3.6/site-packages/paramiko/transport.py:32: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography. The next release of cryptography will remove support for Python 3.6.
  from cryptography.hazmat.backends import default_backend
Creating network "twitter_postgres_indexes_default" with the default driver
Creating twitter_postgres_indexes_pg_denormalized_1     ... done
Creating twitter_postgres_indexes_pg_normalized_batch_1 ... done

However, when I run docker ps, I do not see the twitter_postgres_indexes_pg_normalized_batch_1 container. Is that container or anything else supposed to show up here?:

lambda-server:~/twitter_postgres_indexes (master>) $ docker ps
CONTAINER ID   IMAGE                                      COMMAND                  CREATED         STATUS         PORTS                                       NAMES
9c368f644457   twitter_postgres_indexes_pg_denormalized   "docker-entrypoint.s…"   4 minutes ago   Up 4 minutes   0.0.0.0:7447->5432/tcp, :::7447->5432/tcp   twitter_postgres_indexes_pg_denormalized_1
mikeizbicki commented 7 months ago

Yes, that container should be visible with docker ps if everything ran correctly. You can use docker-compose logs pg_normalized_batch to view the logs for that container to find our why it shut down.

sjanefullerton commented 7 months ago

That is good to know, thank you! These are the outputs when I run that command. Would it be safe for me to remove the /var/lib/postgresql/data directory or is there a better approach to solve this?

lambda-server:~/twitter_postgres_indexes (master>) $ docker-compose logs pg_normalized_batch
/home/sfullerton24/.local/lib/python3.6/site-packages/paramiko/transport.py:32: CryptographyDeprecationWarning: Python 3.6 is no longer supported by the Python core team. Therefore, support for it is deprecated in cryptography. The next release of cryptography will remove support for Python 3.6.
  from cryptography.hazmat.backends import default_backend
Attaching to twitter_postgres_indexes_pg_normalized_batch_1
pg_normalized_batch_1  | The files belonging to this database system will be owned by user "postgres".
pg_normalized_batch_1  | This user must also own the server process.
pg_normalized_batch_1  | 
pg_normalized_batch_1  | The database cluster will be initialized with locale "en_US.utf8".
pg_normalized_batch_1  | The default database encoding has accordingly been set to "UTF8".
pg_normalized_batch_1  | The default text search configuration will be set to "english".
pg_normalized_batch_1  | 
pg_normalized_batch_1  | Data page checksums are disabled.
pg_normalized_batch_1  | initdb: error: directory "/var/lib/postgresql/data" exists but is not empty
pg_normalized_batch_1  | initdb: hint: If you want to create a new database system, either remove or empty the directory "/var/lib/postgresql/data" or run initdb with an argument other than "/var/lib/postgresql/data".
pg_normalized_batch_1  | 
mikeizbicki commented 7 months ago

Yes, it is safe to remove that directory, and that is what you should do if they database gets corrupted like this. The following command (from the part 0 instructions) should achieve that:

$ docker-compose exec pg_normalized_batch bash -c 'rm -rf $PGDATA'
sjanefullerton commented 7 months ago

Thank you for your help. I am sorry, I am confused what to do to fix this. Is there a step I have missed? These are the steps I took:

  1. I brought down/stopped all the containers and pruned the volumes.

  2. I then ran the following to make sure that the directory was removed:

    
    lambda-server:~/twitter_postgres_indexes (master>) $ docker-compose exec pg_normalized_batch bash -c 'rm -rf $PGDATA'

lambda-server:~/twitter_postgres_indexes (master>) $ docker-compose exec pg_denormalized bash -c 'rm -rf $PGDATA'

lambda-server:~/twitter_postgres_indexes (master>) $ rm -rf /var/lib/postgresql/data


3. I checked to see if the directory was removed:

lambda-server:~/twitter_postgres_indexes (master>) $ ls -l /var/lib/postgresql total 16 drwxr-xr-x 3 postgres postgres 4096 Nov 22 2019 10 drwx------ 2 postgres postgres 4096 Jul 2 2020 pythainlp-data drwxrwxr-x 10 postgres postgres 4096 Jul 2 2020 spacy drwxrwxr-x 6 postgres postgres 4096 Jul 2 2020 venv


I do not see it here so I assume it was removed. 

However, when I build and brought back up the containers, I am still getting the same error 

pg_normalized_batch_1 | initdb: error: directory "/var/lib/postgresql/data" exists but is not empty pg_normalized_batch_1 | initdb: hint: If you want to create a new database system, either remove or empty the directory "/var/lib/postgresql/data" or run initdb with an argument other than "/var/lib/postgresql/data".

mikeizbicki commented 7 months ago

Whenever you run commands without docker-compose, they are being run on the lambda server and not inside the container. In particular,

$ ls -l /var/lib/postgresql

is being run on the lambda server. So it can't tell you whether the data directories were successfully removed or not.

In any event, I have deleted these folders for you manually (as root on the lambda server) to get you unstuck on this step.

sjanefullerton commented 7 months ago

Thank you!!