Closed suppathak closed 2 years ago
/assign @oindrillac /assign @chauhankaranraj /assign @suppathak
The thoth-station/support repo doesnt have enough PRs for us to train this model on.
Thus, for retraining the model the 2 options that we are considering are:
@Gkrumbach07 can you suggest a list of repos from the thoth-station org which could be representative of the contributor or maintainer behaviors of the thoth-station/support repo? We can think of excluding some repos like:
Another qs we had was, since this service is trying to help Thoth Guidance Service user, should we exclude bot PRs from the training data that we're feeding into the model that's being trained?
Currently the support repo isnt used that often, and the issues that do get resolved dont always have an attached PR. So maybe instead of time to merge a PR. We can also have time to close an issue. We can use the lifecycle
labels to create a more accurate timeline too.
I agree that we can exclude bot made PRs and issues. so Issues that dont have a bot
label.
As for what repos to train on, There is not a definite list of the repos that external users use to get support. Many times they will make an issue in the repo that is related to their issue. I would choose the repos that have been updated in the last year maybe, or by number of stars, or by number of issues created by non Thoth org account.
Thanks for the information and the suggestion @Gkrumbach07.
We can explore Time to close an Issue
model as a Spike and see if thats something we can deliver. The approach would seem similar to the Time to Merge model, but we would need to performing eda, engineer features, and train a model particular to this use case. Opening an Issue for this.
I agree that we can exclude bot made PRs and issues. so Issues that dont have a bot label.
We will exclude these from our model development and as for the repos, we will start with filtering a list of repos by the criterion you mentioned earlier..
I would choose the repos that have been updated in the last year maybe, or by number of stars, or by number of issues created by non Thoth org account.
We used all 3 filtering criteria suggested (https://github.com/aicoe-aiops/ocp-ci-analysis/pull/490), here is the list of 105 repos that we get from all the repos in the thoth-station org that we will be including in our training data.
{'thoth-station/.github',
'thoth-station/adviser',
'thoth-station/aicoe-ci-pulp-upload-example',
'thoth-station/amun-api',
'thoth-station/amun-client',
'thoth-station/analyzer',
'thoth-station/ansible-role-argo-workflows',
'thoth-station/build-watcher',
'thoth-station/buildlog-parser',
'thoth-station/cleanup-job',
'thoth-station/cli-examples',
'thoth-station/common',
'thoth-station/core',
'thoth-station/cve-update-job',
'thoth-station/datasets',
'thoth-station/dependency-monkey',
'thoth-station/dependency-monkey-zoo',
'thoth-station/document-sync-job',
'thoth-station/fext',
'thoth-station/glyph',
'thoth-station/graph-backup-job',
'thoth-station/graph-refresh-job',
'thoth-station/graph-sync-job',
'thoth-station/help',
'thoth-station/httpd-aicoe-container',
'thoth-station/image-pusher',
'thoth-station/init-job',
'thoth-station/integration-tests',
'thoth-station/invectio',
'thoth-station/investigator',
'thoth-station/jupyter-nbrequirements',
'thoth-station/jupyterlab-requirements',
'thoth-station/jupyternb-build-pipeline',
'thoth-station/kebechet',
'thoth-station/lab',
'thoth-station/license-solver',
'thoth-station/management-api',
'thoth-station/messaging',
'thoth-station/metrics-exporter',
'thoth-station/mi',
'thoth-station/mi-scheduler',
'thoth-station/micropipenv',
'thoth-station/moldavite-api',
'thoth-station/notebooks',
'thoth-station/osiris',
'thoth-station/osiris-build-observer',
'thoth-station/package-analyzer',
'thoth-station/package-extract',
'thoth-station/package-releases-job',
'thoth-station/package-update-job',
'thoth-station/prescriptions',
'thoth-station/ps-cv',
'thoth-station/ps-ip',
'thoth-station/ps-nlp',
'thoth-station/pulp-metrics-exporter',
'thoth-station/pulp-operate-first-web',
'thoth-station/python',
'thoth-station/python-ssdeep',
'thoth-station/qeb-hwt',
'thoth-station/ray-ml-notebook',
'thoth-station/ray-ml-worker',
'thoth-station/ray-operator',
'thoth-station/report-processing',
'thoth-station/reporter',
'thoth-station/revsolver',
'thoth-station/s2i',
'thoth-station/s2i-generic-data-science-notebook',
'thoth-station/s2i-minimal-notebook',
'thoth-station/s2i-pytorch-notebook',
'thoth-station/s2i-scipy-notebook',
'thoth-station/s2i-tensorflow-gpu-notebook',
'thoth-station/s2i-tensorflow-notebook',
'thoth-station/s2i-thoth',
'thoth-station/search',
'thoth-station/selinon-api',
'thoth-station/selinon-worker',
'thoth-station/si-aggregator',
'thoth-station/si-bandit',
'thoth-station/slo-reporter',
'thoth-station/solver',
'thoth-station/solver-error-classfier',
'thoth-station/solver-errors-reporter',
'thoth-station/solver-project-url-job',
'thoth-station/source-management',
'thoth-station/srcops-testing',
'thoth-station/storages',
'thoth-station/support',
'thoth-station/sync-job',
'thoth-station/template-project',
'thoth-station/tensorflow-build-s2i',
'thoth-station/tensorflow-release-api',
'thoth-station/tensorflow-release-job',
'thoth-station/tensorflow-serving-build',
'thoth-station/thamos',
'thoth-station/thoth',
'thoth-station/thoth-application',
'thoth-station/thoth-github-action',
'thoth-station/thoth-ops-infra',
'thoth-station/thoth-pybench',
'thoth-station/thoth-station.github.io',
'thoth-station/user-api',
'thoth-station/website',
'thoth-station/workflow-helpers',
'thoth-station/workflows',
'thoth-station/zuul-config'}
Related pull request: https://github.com/aicoe-aiops/ocp-ci-analysis/pull/495
PR data is uploaded to : "bucketname": "opf-datacatalog-morty"
/close
@Gkrumbach07: Closing this issue.
Persona / User
Thoth Guidance Service User
Reason
Related to #147
Define Done