mikeizbicki / cmc-csci143

big data course materials
40 stars 76 forks source link

ERROR: relation "tweets_jsonb" does not exist on twiter_postgres hw #484

Closed rachelHoman closed 7 months ago

rachelHoman commented 7 months ago

Hi,

I'm currently running into a issue where none of my tests are passing but when I push to GitHub to see where it's failing in the actions I see this output error:

File "/home/runner/.local/lib/python3.10/site-packages/sqlalchemy/engine/default.py", line 919, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedFunction) operator does not exist: text = bigint
LINE 4:         WHERE id_tweets = 1245138808045264898
                                ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

[SQL: 
        SELECT id_tweets 
        FROM tweets
        WHERE id_tweets = %(id_tweets)s
        ]
[parameters: {'id_tweets': 124513880804[526](https://github.com/rachelHoman/twitter_postgres/actions/runs/8527959352/job/23360525648#step:4:527)4898}]
(Background on this error at: https://sqlalche.me/e/20/f405)
load denormalized
ERROR:  relation "tweets_jsonb" does not exist

I have updated the ports so I'm not sure why this issue is happening on the Docker run, but I think it is causing the failed tests. Any advice would be great. Thanks

ben-smith23 commented 7 months ago

Hi @rachelHoman,

I am unable to explain exactly why this occurs (perhaps someone else can), but I can provide a fix. For some reason, Postgres expects a text type, but id_tweets is a bigint. Therefore, you have to type cast the left side like so:

    with connection.begin() as trans:
        # skip tweet if it's already inserted
        sql=sqlalchemy.sql.text('''
        SELECT id_tweets 
        FROM tweets
        WHERE CAST(id_tweets AS BIGINT) = :id_tweets
        ''')
        res = connection.execute(sql,{
            'id_tweets':tweet['id'],
            })
        if res.first() is not None:
            return

Hope this helps!

mikeizbicki commented 7 months ago

@ben-smith23 Your fix makes the error go away, but will result in bigger errors down the line.

@rachelHoman

Recall that in the denormalized database, tweets is a VIEW over the tweets_jsonb table. In the normalized database, there is no tweets_jsonb table. When working with the normalized database, you shouldn't get any mentions of tweets_jsonb.

Also recall that the denormalized database solution involves only using a COPY command. One of the advantages of the denormalized database is that it is much easier/faster to add data to because you don't have to muck about in python.

Now let's look at the last line of your error

ERROR:  relation "tweets_jsonb" does not exist

This tells me that you're trying to insert into the denormalized database from python. So wherever you are calling the python code, you are probably calling it with the wrong url (i.e. the url for the denormalized database instead of the normalized database).

rachelHoman commented 7 months ago

Ah I see. Thank you for explaining @mikeizbicki and for the workaround @ben-smith23. The issue was my ports were flipped