typedb-osi / typedb-loader

TypeDB Loader - Data Migration Tool for TypeDB
https://github.com/typedb-osi/typedb-loader
Apache License 2.0
58 stars 17 forks source link

Problematic .tsv processing #62

Open suciokhan opened 2 years ago

suciokhan commented 2 years ago

When trying to ingest from .tsv files using Loader 1.4.1 on Ubuntu 20.04, I receive the following error:

[open_alex_0::5] ERROR com.vaticle.typedb.osi.loader.loader - async-writer-4: [THW07] Invalid Thing Write: Attempted to assign a key ',' of type 'id' that had been taken by another 'researcher'.

However, I've reviewed the .tsv and confirmed there are no comma values in this column; all values are open_alex identifiers, which are URLs starting with https.

In my typeDB config.json file, I have it set to expect tab separators, and it successfully ingests hundreds of thousands of rows.

"separator": "\t",

Below is a screenshot of confirming there are no commas in the id column using Python and Pandas.

image

I considered it being an issue with perhaps the header since it fails on the 2nd .tsv it's going through, as there is one record in the database with a comma for an id. image

However, it doesn't fail until processing over 600,000 rows according to TypeDB processing updates. image

flyingsilverfin commented 2 years ago

So it does sound like the data is corrupt somehow, have you managed to track down the duplicate ,?

suciokhan commented 2 years ago

There are no comma values for the id column in the source data.

hkuich commented 2 years ago

Any chance that you could share the data file? Would be fine to obfuscate it as long as it reproduces the error...

On Wed, Aug 31, 2022, 16:54 suciokhan @.***> wrote:

There are no comma values for the id column in the source data.

— Reply to this email directly, view it on GitHub https://github.com/typedb-osi/typedb-loader/issues/62#issuecomment-1233046880, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWAVSVKKPIWUCHNEOSU77TV35W2NANCNFSM5753BHUA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

suciokhan commented 2 years ago

Sure, I will send you a link to the 2 files I was having trouble with.