Open mdorf opened 3 years ago
A temporary workaround involves duplicating the production 4store instance, running the Mapping Counts Generator script against the duplicate instance and then migrating the newly generated MappingCount graphs back to the production 4store instance.
Stop ncbo_cron in Prod
Copy Prod 4store data to a secondary 4store instance:
$ 4s-dump http://<prod 4store>/sparql/ -f mapping_count_graph
$ cat mapping_count_graph
http://data.bioontology.org/metadata/MappingCount
4s-dump will create a directory 'data' containing graphs which are listed in file specified by -f flag
Change ncbo_cron's config.rb
file to point to the secondary 4store instance
Kick off the Mapping Counts Generator script within CRON
After Mapping Counts Generator script completes its run, export the MappingCount graph and import it into the Prod 4store instance:
find data -type f | 4s-restore <kb_name>
Need to make sure that data directory is removed so that nothing else gets loaded
There is a scheduled CRON job that runs weekly to re-generate total mapping counts and mapping pair counts between classes in different ontologies. Unfortunately, this job has never been able to execute fully in the production environment due to 4store crashes. This behavior is documented in ncbo/ncbo_cron#39.
We succeeded in completing this job in an isolated 4store environment populated with production data. The newly generated mapping count graphs were then exported into the production 4store instance. Unfortunately, this does not qualify as a permanent solution. We need to find an alternative, whether by a code optimization or a separate process, that allows us to keep the mapping counts (total counts from an ontology to all other ontologies as well as pair counts of mappings between individual ontologies) refreshed regularly.