isabelizimm / mlops-metaanalysis

MIT License
2 stars 0 forks source link

[model] what features seem useful #13

Closed isabelizimm closed 1 year ago

isabelizimm commented 1 year ago

Feature engineering

Numeric attributes from repo level data

stargazers_count has_issues has_projects has_downloads has_wiki has_pages has_discussions open_issues_count allow_forking is_template age_days: (datetime.datetime(2023, 1, 21) -created_at).dt.components.days time_since_last_commit_days: (datetime.datetime(2023, 1, 21) -pushed_at).dt.components.days

Text attributes from repo level data

language: category dtype topics owner license

LDA outputs

topics at ten topics

Topic 0: ['learning', 'azure', 'example', 'mlops', 'machine', 'using', 'project'] Topic 1: ['list', 'curated', 'open', 'source', 'awesome', 'ml', 'tools'] Topic 2: ['end', 'model', 'kubeflow', 'cloud', 'learning', 'machine', 'mlops'] Topic 3: ['platform', 'mlops', 'project', 'machine', 'learning', 'data', 'development'] Topic 4: ['ci', 'cd', 'sagemaker', 'mlops', 'deploy', 'learning', 'model'] Topic 5: ['data', 'mlops', 'ml', 'resources', 'practices', 'best', 'pipelines'] Topic 6: ['feature', 'mlops', 'ml', 'data', 'collection', 'store', 'training'] Topic 7: ['ml', 'datatalksclub', 'solution', 'python', 'using', 'docker', 'b'] Topic 8: ['documentation', 'end', 'mlflow', 'mlops', 'learning', 'machine', 'examples'] Topic 9: ['learning', 'mlops', 'machine', 'azure', 'data', 'engineering', 'course']

We could use some of these topics as inspiration for clusters?

isabelizimm commented 1 year ago

closing since I have no current plans to work on this issue, can always reopen if needed!