Open fridex opened 2 years ago
CC @Gregory-Pereira @harshad16
Thank you @fridex for the issue with all details
/label sig-devsecops
@harshad16: The label(s) /label sig-devsecops
cannot be applied. These labels are supported: community/discussion, community/group-programming, community/maintenance, community/question, deployment_name/ocp4-stage, deployment_name/ocp4-test, deployment_name/moc-prod, hacktoberfest, hacktoberfest-accepted, kind/cleanup, kind/demo, kind/deprecation, kind/documentation, kind/question, sig/advisor, sig/build, sig/cyborgs, sig/devops, sig/documentation, sig/indicators, sig/investigator, sig/knowledge-graph, sig/slo, sig/solvers, thoth/group-programming, thoth/human-intervention-required, thoth/potential-observation, tide/merge-method-merge, tide/merge-method-rebase, tide/merge-method-squash, triage/accepted, triage/duplicate, triage/needs-information, triage/not-reproducible, triage/unresolved, lifecycle/submission-accepted, lifecycle/submission-rejected
/label sig/devops /triage accepted
@harshad16: The label(s) sig/devops
cannot be applied, because the repository doesn't have them.
* [ ] make sure the sync-job uses the adjusted method and we can turn the job into a cronworkflow that periodically syncs data into the database (multiple sync jobs can be part of the cronworkflow to support parallel syncs)
I think Maya's current PR should update everything in such a way in
storages
that the only changes needed in thesync-job
are to bump the version for storages once it goes through, and construct aCronWorkflow
from theCronJob template
in openshift.yaml.
Is your feature request related to a problem? Please describe.
As Thoth operator, I would like to make sure data are properly managed across deployments. As of now, we use a staging environment to compute data (given the resources available) and propagate a database dump to prod environments. This situation proved to be not scalable long-term as production can write into the database as well and we can get out of sync easily. If we overwrite the prod environment with staging data, we can also lose some information.
One of the proposed solutions discussed was to do updates per-table. This solution can introduce overhead and possible inconsistencies we should avoid (ex. package entries in the database created by solvers can be overwritten by packages detected using container image analyses).
Another solution is to let syncs happen on the programming level. In other words, we could keep our background jobs that copy data from staging environment to production running (document-sync-job). In that case, the job places documents on ceph in the production environment. A subsequent graph-sync job can sync these data into the database so that they are available in prod (even if they were computed in the staging environment). This approach seems to be scalable and might require less maintenance.
Additional Info: Epic: https://github.com/thoth-station/thoth-application/issues/2216