Closed giffiecode closed 7 months ago
Is the container showing up when you run docker ps
?
after bringing it down no
But like, after you bring it down and delete volumes and rebuild and bring it up. Then if you run docker ps
is it there?
after build and up yes
I have the same exact error. I'm pretty sure my ports matchup between the compose.yml file and the .sh files.
@ypei23 Did you change the port in load_denormalized.sh
?
I have updated the port the load_denormalized.sh
. Currently my denormalized seems to work, but my normalized batch is printing out all the tweets in json format
ERROR: invalid input syntax for type json
DETAIL: Token "mage_url" is invalid.
CONTEXT: JSON data, line 1: ...d":false,"retweeted":false,"filter_level"mage_url...
COPY tweets_jsonb, line 1410717, column data: "{"created_at":"Tue Jan 05 12:55:30 +0000 2021","id":1346440186884812803,"id_str":"134644018688481280..."
ERROR: invalid input syntax for type json
DETAIL: Token "w" is invalid.
CONTEXT: JSON data, line 1: ...sKite\/status\/1347836811540889601\/photo\/1",""w...
COPY tweets_jsonb, line 1404426, column data: "{"created_at":"Sat Jan 09 12:12:08 +0000 2021","id":1347878823619141632,"id_str":"134787882361914163..."
ERROR: invalid input syntax for type json
DETAIL: Expected ":", but found "}".
CONTEXT: JSON data, line 1: ...rmal.jpg","profile_image_url_xzWT9v7.mp4?tag=10"}...
COPY tweets_jsonb, line 1422985, column data: "{"created_at":"Mon Jan 04 12:53:39 +0000 2021","id":1346077330704277505,"id_str":"134607733070427750..."
are you getting something like this?
and then after that, this:
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 1457?
(Background on this error at: https://sqlalche.me/e/14/e3q8)
Command exited with non-zero status 10
Update: I got those errors, and checked docker ps
and sure enough the container was not on the port I thought it was. I brought everything down and ran the rm rf
stuff he gave us, then rebuilt and brought them back up. When I checked docker ps
again the containers were listening on the ports I expected them to be.
Then I reran load_tweets_parallel.sh
and got different errors, seemingly related to the urls
column of the user table. If that's you I'd double check that the columns in the schema he gives us for this homework match the columns you used (and therefore reference in load_tweets_batch.py
) in the previous homework and, if not, change your .py
.
I think the error is gonna look like a huge mess regardless because we're loading so much in at once and I think it'll show us all the SQL queries for the errors so.. a lot to sift through. Personally I've been running the load_tweets_parallel.sh
in the background with nohup so that if I close my laptop it doesn't explode.
nohup ./load_tweets_parallel.sh > output.log 2>&1 &
I had to fiddle with the permissions a little to get it to work but it seems worth it. This has the added benefit of putting all the errors in an output.log file that I can then look at with vim rather than trying to read it all in terminal.
I hope something here is helpful, I'm also working through the homework right now so idk if what I'm doing here is actually gonna work lol
I added sizes constraints in my pg_normalized_batch
echo "$files" | time parallel --jobs 1 sizes=1 python3 -u load_tweets_batch.py --db=postgresql://postgres:pass@localhost:7272/ --inputs
but it's still printing a ton of tweets in json format
Can I see a little of the json you said is printing? I think that's just what happens when the SQL commands throw errors but I'm not sure if we're looking at the same things.
I think that maybe these files are just too big such that doing one at a time is still gonna output errors that are hard to work with. Have you tried output redirecting to a file and then looking at it?
Or if the error isn't huge you could post the whole thing here?
i've updated the port number to 23451 for the denormalized database on both yml and .sh file. however, when i run the
sh load_tweets_parallel
I still get an error message connecting to port 54321, which is the port I've used for last hw.I've run these command to bring the container down and delete the volume
docker-compose down
docker rm -f $(docker ps -aq)
docker volume rm $(docker volume ls -q)
docker-compose build
docker-compose up -d