monarch-initiative / monarch-ingest

Data ingest application for Monarch Initiative knowledge graph using Koza
https://monarchinitiative.org
15 stars 2 forks source link

Upload to kghub S3 bucket #424

Closed kevinschaper closed 1 year ago

kevinschaper commented 1 year ago

We only have a few hand-moved kg archives hosted at https://kg-hub.berkeleybop.io/kg-monarch/index.html, which also means (I think) that we don't get kghub dashboard updates as we release.

We should set up whatever authentication & tools we need so that our Jenkins job can copy from the GCP bucket to the S3 bucket at the end of the ingest process.

glass-ships commented 1 year ago

@caufieldjh i gave this a shot a couple builds ago but forgot to ask if the data made it over! i think we may still need to trigger a re-index or something... any thoughts?

caufieldjh commented 1 year ago

The good news is that I see the directories there - four fresh builds, from March 10, 12, 14, and 16. The bad news is that four empty directories got written to s3://kg-hub-public-data/kg-monarch/, and identical directories are present in s3://kg-hub-public-data/kg-monarch/current/, with only the ones in current containing data. But I do see a bunch of uploads there:

$ s3cmd ls s3://kg-hub-public-data/kg-monarch/current/2023-03-16/
                          DIR  s3://kg-hub-public-data/kg-monarch/current/2023-03-16/qc/
                          DIR  s3://kg-hub-public-data/kg-monarch/current/2023-03-16/rdf/
                          DIR  s3://kg-hub-public-data/kg-monarch/current/2023-03-16/transform_output/
2023-03-16 01:01            0  s3://kg-hub-public-data/kg-monarch/current/2023-03-16/2023-03-16
2023-03-16 01:01        42812  s3://kg-hub-public-data/kg-monarch/current/2023-03-16/merged_graph_stats.yaml
2023-03-16 01:08    316529982  s3://kg-hub-public-data/kg-monarch/current/2023-03-16/monarch-kg-denormalized-edges.tsv.gz
2023-03-16 01:13    696990156  s3://kg-hub-public-data/kg-monarch/current/2023-03-16/monarch-kg.db.gz
2023-03-16 01:13    752656124  s3://kg-hub-public-data/kg-monarch/current/2023-03-16/monarch-kg.neo4j.dump
2023-03-16 01:05     98025319  s3://kg-hub-public-data/kg-monarch/current/2023-03-16/monarch-kg.tar.gz
2023-03-16 01:01        38384  s3://kg-hub-public-data/kg-monarch/current/2023-03-16/qc_report.yaml
2023-03-16 01:18   1571614622  s3://kg-hub-public-data/kg-monarch/current/2023-03-16/solr.tar.gz

You won't see those at https://kg-hub.berkeleybop.io/kg-monarch/ - I've been updating the index.html there by hand. The other KGs generally use this to update the index as part of the build process: https://github.com/Knowledge-Graph-Hub/multi-indexer

glass-ships commented 1 year ago

Thanks again for your help, harry!

I believe this has been resolved with #426

Will open a new issue if further issues arise

caufieldjh commented 1 year ago

Great - I see the most recent version (as of 2023-04-16) on the bucket.