ncats / pharos-graphql-server

3 stars 4 forks source link

Info for transforming the database #83

Closed juju75001 closed 2 years ago

juju75001 commented 2 years ago

Hello, I need to do a lot of request on the pharos API, but it is too slow, so I tried to clone this repo to get a local instance of the API. I downloaded the TCRD.sql and now I must transform the database (using the folder pharos_database_transforms), the README.md mentions a "pharos-etl repo" but I find zero information on this repo, could you please give some details on where to find it or how to transform the database ? Thanks in advance for your help

KeithKelleher commented 2 years ago

Hi, There is another repo, but it is currently internal to NCATS. I can probably share it with you if that's the way we decide to do it. Currently, we are porting the migrations to another ETL framework for integration into TCRD directly (https://github.com/unmtransinfo/TCRD), instead of as a series of migrations to do after downloading TCRD. That means there are two repos of migrations now. So, it might be easier to do one of these instead:

juju75001 commented 2 years ago

Thank you for your reply. I don't want to bother you with SQL query (and my code is already designed for graphQL query), so I think the easiest way would be your first proposal: sql-dump a migrated version of TCRD if that's possible (and could help other people waiting for the public release of the new migration).

KeithKelleher commented 2 years ago

Cool, here's a dump of our pharos-ready TCRD. Hope that works out. https://opendata.ncats.nih.gov/public/pharos/

juju75001 commented 2 years ago

Hi, I finished loading the database in a local MySQL instance. I had a small weird error in the file databaseTable.ts: all the TableLink variable where undefined, I corrected it by editing some key written in lower case to upper case. E.g. in the function getKeys():

this.links.push(new TableLink(
  rows[rowKey]['COLUMN_NAME'],// instead of rows[rowKey]['column_name'],
  rows[rowKey]['REFERENCED_TABLE_NAME'],// instead of rows[rowKey]['referenced_table_name'],
  rows[rowKey]['REFERENCED_COLUMN_NAME']));// instead of rows[rowKey]['referenced_column_name']));

Now everything seems to work fine ! Thanks again for your help.