nasa-jpl-cord-19 / covid19-knowledge-graph

Builds a knowledge graph from the [COVID-19 Open Research Dataset (CORD-19)](https://pages.semanticscholar.org/coronavirus-research) dataset.
Apache License 2.0
16 stars 3 forks source link

"fixing" a potential bug where file is written to disk after each json file is read causing extreme slowness #1

Closed philipsoutham closed 8 months ago

philipsoutham commented 4 years ago

When I was running the

$ sbt
> run /home/user/comm_use_subset

I noticed that it was creating a bunch of disk write activity which i tracked down to this. It was basically re-writing the covid19_knowledge_graph.ttl after each json file. I moved the write operation outside the for loop. This speeds up the processing 1000%

p.s. sorry for the updates, I keep hitting shift return for line breaks like you do slack.

lewismc commented 4 years ago

Hi @philipsoutham, it would be greatly appreciated if you could check master and let me know if the issue you identified is fixed thanks. I'm still working on data modelling to improve queries of the resulting TTL artifact.

philipsoutham commented 4 years ago

Hi @philipsoutham, it would be greatly appreciated if you could check master and let me know if the issue you identified is fixed thanks. I'm still working on data modelling to improve queries of the resulting TTL artifact.

@lewismc see https://github.com/nasa-jpl-cord-19/covid19-knowledge-graph/issues/2

lewismc commented 4 years ago

Can you please rebase this against master... really sorry to have neglected this.