Closed kevinschaper closed 1 year ago
@caufieldjh i gave this a shot a couple builds ago but forgot to ask if the data made it over! i think we may still need to trigger a re-index or something... any thoughts?
The good news is that I see the directories there - four fresh builds, from March 10, 12, 14, and 16.
The bad news is that four empty directories got written to s3://kg-hub-public-data/kg-monarch/
,
and identical directories are present in s3://kg-hub-public-data/kg-monarch/current/
, with only the ones in current
containing data. But I do see a bunch of uploads there:
$ s3cmd ls s3://kg-hub-public-data/kg-monarch/current/2023-03-16/
DIR s3://kg-hub-public-data/kg-monarch/current/2023-03-16/qc/
DIR s3://kg-hub-public-data/kg-monarch/current/2023-03-16/rdf/
DIR s3://kg-hub-public-data/kg-monarch/current/2023-03-16/transform_output/
2023-03-16 01:01 0 s3://kg-hub-public-data/kg-monarch/current/2023-03-16/2023-03-16
2023-03-16 01:01 42812 s3://kg-hub-public-data/kg-monarch/current/2023-03-16/merged_graph_stats.yaml
2023-03-16 01:08 316529982 s3://kg-hub-public-data/kg-monarch/current/2023-03-16/monarch-kg-denormalized-edges.tsv.gz
2023-03-16 01:13 696990156 s3://kg-hub-public-data/kg-monarch/current/2023-03-16/monarch-kg.db.gz
2023-03-16 01:13 752656124 s3://kg-hub-public-data/kg-monarch/current/2023-03-16/monarch-kg.neo4j.dump
2023-03-16 01:05 98025319 s3://kg-hub-public-data/kg-monarch/current/2023-03-16/monarch-kg.tar.gz
2023-03-16 01:01 38384 s3://kg-hub-public-data/kg-monarch/current/2023-03-16/qc_report.yaml
2023-03-16 01:18 1571614622 s3://kg-hub-public-data/kg-monarch/current/2023-03-16/solr.tar.gz
You won't see those at https://kg-hub.berkeleybop.io/kg-monarch/ - I've been updating the index.html there by hand. The other KGs generally use this to update the index as part of the build process: https://github.com/Knowledge-Graph-Hub/multi-indexer
Thanks again for your help, harry!
I believe this has been resolved with #426
Will open a new issue if further issues arise
Great - I see the most recent version (as of 2023-04-16) on the bucket.
We only have a few hand-moved kg archives hosted at https://kg-hub.berkeleybop.io/kg-monarch/index.html, which also means (I think) that we don't get kghub dashboard updates as we release.
We should set up whatever authentication & tools we need so that our Jenkins job can copy from the GCP bucket to the S3 bucket at the end of the ingest process.