typedb-osi / typedb-bio

TypeDB Bio: Biomedical Knowledge Graph
73 stars 30 forks source link

Unicode error when installing DGldb #24

Closed jackn11 closed 12 months ago

jackn11 commented 1 year ago

When running python migrator.py -n 4 --force True I get the following error.

Opening DGIdb...

  Downloading dataset
100% [..........................................................................] 2809387 / 2809387  Finished downloading
  Starting with drugs.
  Drugs inserted! (57497 entries)
  Downloading drug-gene interactions dataset
100% [..........................................................................] 9512574 / 9512574  Finished downloading
Traceback (most recent call last):
  File "C:\Users\jackn\TypeDBBio\typedb-bio\migrator.py", line 60, in <module>
    migrate_dgibd(session, NUM_DR, NUM_INT, args.num_threads, args.commit_batch)
  File "C:\Users\jackn\TypeDBBio\typedb-bio\Migrators\DGIdb\DGIdbMigrator.py", line 18, in migrate_dgibd
    insert_interactions(session, num_int, num_threads, batch_size)
  File "C:\Users\jackn\TypeDBBio\typedb-bio\Migrators\DGIdb\DGIdbMigrator.py", line 68, in insert_interactions
    raw_file = openFile(file, num_int)
  File "C:\Users\jackn\TypeDBBio\typedb-bio\Migrators\Helpers\open_file.py", line 9, in openFile
    for row in csvreader:
  File "C:\Users\jackn\AppData\Local\Programs\Python\Python310\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 4295: character maps to <undefined>
(.venv) PS C:\Users\jackn\TypeDBBio\typedb-bio>
james-whiteside commented 12 months ago

Should be fixed following review.