mikeizbicki / cmc-csci143

big data course materials
40 stars 76 forks source link

normalized tests failing in GitHub Actions, InvalidRequestError #311

Open hfmandell opened 1 year ago

hfmandell commented 1 year ago

My normalized test is failing in GitHub actions with this error:

Traceback (most recent call last):
  File "/home/runner/work/twitter_postgres/twitter_postgres/load_tweets.py", line 411, in <module>
    insert_tweet(connection,tweet)
  File "/home/runner/work/twitter_postgres/twitter_postgres/load_tweets.py", line 106, in insert_tweet
    with connection.begin() as trans:
  File "/home/runner/.local/lib/python3.10/site-packages/sqlalchemy/engine/base.py", line 853, in begin
    raise exc.InvalidRequestError(
sqlalchemy.exc.InvalidRequestError: This connection has already initialized a SQLAlchemy Transaction() object via begin() or autobegin; can't call begin() here unless rollback() or commit() is called first.

It is thus not reaching the database and simply failing all the SQL tests. Does anyone have tips on how to work with this? I see it is related to the transactions we've discussed in class. I don't see more than one connection.begin() in load_tweets.py

jaymaliye commented 1 year ago

+1

I am getting the same error.

nickwilson3 commented 1 year ago

+1

tennisoctocat commented 1 year ago

I was able to solve this by moving the following lines of code inside the with connection.begin() as trans: block:

# skip tweet if it's already inserted
    sql=sqlalchemy.sql.text('''
    SELECT id_tweets 
    FROM tweets
    WHERE id_tweets = :id_tweets
    ''')
    res = connection.execute(sql,{
        'id_tweets':tweet['id'],
        })  
    if res.first() is not None:
        return

I think it could also be solved by adding connection.commit() before the with connection.begin() as trans: line but I didn't try that.

It seems that there two ways to create transactions according to the sqlalchemy source code:

  1. Inside a with connection.begin() as trans: block
  2. By simply calling commands like connection.execute(...) and then later calling connection.commit().

If this is the case, then issue we have is likely because we called connection.execute(...) to check whether the tweet exists but never called connection.commit() before the with connection.begin() as trans: line.

I'm not sure why this only fails in the github actions and not on the lambda server though.

hfmandell commented 1 year ago

Thank you for this helpful response! That was most certainly the issue. I tried your first suggestion of " adding connection.commit() before the with connection.begin() as trans: line" and that was successful.