hackingthemarkets / ark-funds-tracker

Track the holdings of ARK ETF funds in PostgreSQL
87 stars 45 forks source link

duplicate key value violates unique constraint #1

Open Katsumoto1984 opened 3 years ago

Katsumoto1984 commented 3 years ago

Hi Larry, Thanks a lot for your fantastic videos. I just run your 'populate_etf.py'-script and face some issues with dublicate keys in the DB:

PRNT ARKF ARKQ ARKG Traceback (most recent call last): File "c:\Users\katsu\VC Code\Ark track\populate_etf.py", line 35, in cursor.execute(""" File "C:\Users\katsu\AppData\Local\Programs\Python\Python39\lib\site-packages\psycopg2\extras.py", line 146, in execute return super(DictCursor, self).execute(query, vars) psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "etf_holding_pkey" DETAIL: Key (etf_id, holding_id, dt)=(36, 10124, 2021-02-05) already exists.

Have you seen similar issues and is there an easy way to work around it?

rvost commented 3 years ago

@Katsumoto1984, I run into the same error although I use JSON files from arktrack.com (ARK Invest blocks IP from my country). Basically error caused by multiple rows with the same holding on a given date. This seems logical because I am using a questionable data source, but have no idea why this is happening with the official data. My solution simply ignores duplicates:

try:
    cursor.execute("""
        INSERT INTO etf_holdings (etf_id, holding_id, dt, shares, weight)
        VALUES (%s, %s, %s, %s, %s)
     """, (etf['id'], stock['id'], date, shares, weight))
except psycopg2.IntegrityError as ex:
    print('Duplicate ignored: ', ex)
    connection.rollback()
else:
    connection.commit()

Note: we need connection.rollback() after error because subsequent executes will fail in the failed transaction therefore we commit every individual insert so it's a suboptimal solution.

P.S. I'm new to psycopg2 and will be glad to see more optimal solutions.