Open droidlyx opened 1 week ago
Thank you for trying out BELB.
From the error it looks like the problem is ctd_diseases_kb
:
sqlite3.IntegrityError: UNIQUE constraint failed: ctd_diseases_kb.identifier, ctd_diseases_kb.name
Is it possible that this is not the first time you run the script?
IIRC there's no mechanism to skip creating a KB if it's already there, which would explain the error: the script is trying to add duplicate data.
Can you try changing --dir .
?
TODO: Add a check for existing KB here
But even after I change folder, or delete and reinstall belb the error still persists It's seems that the error is in the populate_table step
There's a unique constraint set in schema to prevent the same value of both name and identifier, but the entries in populate_table function contains entries of both the same name and identifier. I see there's a drop_duplicates function in kb.py but not actually executed, maybe it should be executed? Yes, I can run successfully after setting the dedup parameter in to_database function in kbs.py to True, I don't know if this is intended, but it's set to false by default
Wait, when it comes to NCBI gene, the code says cannot perform deduplication when reading data by chunks (i.e. chunksize>0
), so there's still duplication and raised UNIQUE constraint failed error
Hello, I encountered an error with SQL when running python -m belb.scripts.build_kbs --dir . --cores 20 --umls ../2017AA-full/2017AA/META --db ./db.yaml: