iterative / dvc.org

📖 DVC website and documentation
https://dvc.org
Apache License 2.0
330 stars 387 forks source link

blog: cleanup tags? #2340

Closed jorgeorpinel closed 3 years ago

jorgeorpinel commented 3 years ago

There's lots of single-use labels. Many could be removed or changed to a better one (with more usage).

Full list (crude command that assumes no blog has more than 10 labels):

$ grep "tags:" content/blog/* -A 10 | awk '{print $2 " " $3}' | grep '^- ' | sort -f | uniq -c
      2 - Ambassador
      1 - Autocomplete
      1 - Azure
      1 - Benchmark
      3 - Best
      1 - Bitbucket
      1 - Blogging
      1 - Book
      2 - Cache
      9 - CI/CD
      1 - CLI
      1 - Cloud
     20 - CML
      1 - cml-send-comment
      1 - Completion
      1 - Conda
      2 - Conference
      3 - Continuous
      2 - DAGsHub
      2 - Data
      5 - DataOps
      3 - DevOps
     20 - Discord
      2 - DivOps
      2 - Docker
      2 - Documentation
     10 - DVC
      1 - echo
      1 - Engineering
      1 - External
      1 - GCP
     14 - Gems
      1 - Git
      2 - GitHub
      1 - GitLab
      8 - Google
      1 - gpu
      1 - GPUs
      3 - Hacktoberfest
     23 - Heartbeat
      1 - Homebrew
      2 - Hyperparameters
      2 - Import
      1 - Machine
      9 - Meetup
      2 - Mentoring
      1 - Metrics
      1 - MinIO
      1 - ML
     17 - MLOps
      1 - Model
      1 - Monorepo
      1 - New
      2 - Open
      1 - Optimization
      1 - Performance
      2 - Pipeline
      5 - Pipelines
      4 - Plots
      1 - Podcast
      1 - Productivity
      1 - Project
      1 - PTDC-18
      2 - PyCon
      1 - PyData
      5 - Python
      1 - PyTorch
      6 - R
      1 - Rclone
      1 - Reddit
      6 - Release
      2 - Reproducibility
      1 - RStats
      2 - SciPy
      1 - Self-hosted
      1 - shtab
      1 - spaCy
      1 - Spell
      1 - SSH
      1 - Students
      1 - Tab
      1 - Tags
      2 - Terraform
      5 - Tutorial
      2 - Udemy
      2 - Vega
      1 - Videos
      1 - Volunteer
      1 - YouTube

(That's the current output on master).

Obvious issues (examples):

jorgeorpinel commented 3 years ago

And should we have a tag cloud feature perhaps? Or any graphic way to navigate tags Cc @rogermparent WDYT? (Should be a separate issue.)

rogermparent commented 3 years ago

This would be be a significant task to implement, but is certainly doable. We can do some Gatsby querying to pull all tag data in a consumable form relatively easily.

Speaking of which, here's a sorted tag count from Gatsby's GraphQL

[
  { totalCount: 23, fieldValue: 'Heartbeat' },
  { totalCount: 20, fieldValue: 'CML' },
  { totalCount: 17, fieldValue: 'MLOps' },
  { totalCount: 14, fieldValue: 'Discord' },
  { totalCount: 14, fieldValue: 'Gems' },
  { totalCount: 9, fieldValue: 'Meetup' },
  { totalCount: 8, fieldValue: 'DVC' },
  { totalCount: 6, fieldValue: 'CI/CD' },
  { totalCount: 6, fieldValue: 'Discord Gems' },
  { totalCount: 6, fieldValue: 'R' },
  { totalCount: 6, fieldValue: 'Release' },
  { totalCount: 5, fieldValue: 'DataOps' },
  { totalCount: 5, fieldValue: 'Pipelines' },
  { totalCount: 5, fieldValue: 'Tutorial' },
  { totalCount: 4, fieldValue: 'Google Season of Docs' },
  { totalCount: 4, fieldValue: 'Plots' },
  { totalCount: 4, fieldValue: 'Python' },
  { totalCount: 3, fieldValue: 'Best Practices' },
  { totalCount: 3, fieldValue: 'CI/CD for ML' },
  { totalCount: 3, fieldValue: 'DevOps' },
  { totalCount: 3, fieldValue: 'Hacktoberfest' },
  { totalCount: 2, fieldValue: 'Ambassador' },
  { totalCount: 2, fieldValue: 'Cache' },
  { totalCount: 2, fieldValue: 'Conference' },
  { totalCount: 2, fieldValue: 'DAGsHub' },
  { totalCount: 2, fieldValue: 'DVC 1.0' },
  { totalCount: 2, fieldValue: 'DivOps' },
  { totalCount: 2, fieldValue: 'Docker' },
  { totalCount: 2, fieldValue: 'Documentation' },
  { totalCount: 2, fieldValue: 'GitHub Actions' },
  { totalCount: 2, fieldValue: 'Google Drive' },
  { totalCount: 2, fieldValue: 'Hyperparameters' },
  { totalCount: 2, fieldValue: 'Import' },
  { totalCount: 2, fieldValue: 'Mentoring' },
  { totalCount: 2, fieldValue: 'Open Source Summit' },
  { totalCount: 2, fieldValue: 'Pipeline' },
  { totalCount: 2, fieldValue: 'PyCon' },
  { totalCount: 2, fieldValue: 'Reproducibility' },
  { totalCount: 2, fieldValue: 'SciPy' },
  { totalCount: 2, fieldValue: 'Terraform' },
  { totalCount: 2, fieldValue: 'Vega' },
  { totalCount: 1, fieldValue: 'Autocomplete' },
  { totalCount: 1, fieldValue: 'Azure' },
  { totalCount: 1, fieldValue: 'Benchmark' },
  { totalCount: 1, fieldValue: 'Bitbucket' },
  { totalCount: 1, fieldValue: 'Blogging' },
  { totalCount: 1, fieldValue: 'Book' },
  { totalCount: 1, fieldValue: 'CLI' },
  { totalCount: 1, fieldValue: 'Cloud' },
  { totalCount: 1, fieldValue: 'Completion' },
  { totalCount: 1, fieldValue: 'Conda' },
  { totalCount: 1, fieldValue: 'Continuous Integration' },
  { totalCount: 1, fieldValue: 'Continuous Machine Learning' },
  { totalCount: 1, fieldValue: 'Continuous integration' },
  { totalCount: 1, fieldValue: 'Data' },
  { totalCount: 1, fieldValue: 'Data registry' },
  { totalCount: 1, fieldValue: 'Engineering' },
  { totalCount: 1, fieldValue: 'External Data' },
  { totalCount: 1, fieldValue: 'GCP' },
  { totalCount: 1, fieldValue: 'GPUs' },
  { totalCount: 1, fieldValue: 'Git LFS' },
  { totalCount: 1, fieldValue: 'GitLab CI' },
  { totalCount: 1, fieldValue: 'Google Cloud Storage' },
  { totalCount: 1, fieldValue: 'Google Summer of Code' },
  { totalCount: 1, fieldValue: 'Homebrew' },
  { totalCount: 1, fieldValue: 'ML Summit 2021' },
  { totalCount: 1, fieldValue: 'Machine Learning' },
  { totalCount: 1, fieldValue: 'Metrics' },
  { totalCount: 1, fieldValue: 'MinIO' },
  { totalCount: 1, fieldValue: 'Model Ensembling' },
  { totalCount: 1, fieldValue: 'Monorepo' },
  { totalCount: 1, fieldValue: 'New feature' },
  { totalCount: 1, fieldValue: 'Optimization' },
  { totalCount: 1, fieldValue: 'PTDC-18' },
  { totalCount: 1, fieldValue: 'Performance' },
  { totalCount: 1, fieldValue: 'Podcast' },
  { totalCount: 1, fieldValue: 'Productivity' },
  { totalCount: 1, fieldValue: 'Project' },
  { totalCount: 1, fieldValue: 'PyData' },
  { totalCount: 1, fieldValue: 'PyTorch' },
  { totalCount: 1, fieldValue: 'Python API' },
  { totalCount: 1, fieldValue: 'RStats' },
  { totalCount: 1, fieldValue: 'Rclone' },
  { totalCount: 1, fieldValue: 'Reddit' },
  { totalCount: 1, fieldValue: 'SSH' },
  { totalCount: 1, fieldValue: 'Self-hosted Runner' },
  { totalCount: 1, fieldValue: 'Spell' },
  { totalCount: 1, fieldValue: 'Students' },
  { totalCount: 1, fieldValue: 'Tab' },
  { totalCount: 1, fieldValue: 'Tags' },
  { totalCount: 1, fieldValue: 'Udemy' },
  { totalCount: 1, fieldValue: 'Udemy Course' },
  { totalCount: 1, fieldValue: 'Videos' },
  { totalCount: 1, fieldValue: 'Volunteer' },
  { totalCount: 1, fieldValue: 'YouTube' },
  { totalCount: 1, fieldValue: 'shtab' },
  { totalCount: 1, fieldValue: 'spaCy' }
]

and, for reference, the query to get it

{
  allBlogPost {
    group(field: tags) {
      totalCount
      fieldValue
    }
  }
}
shcheklein commented 3 years ago

My 2cs - the only thing I would do is to try to merge some tags in case we see some problems. Also, I agree that we should have a list of tags online, to navigate and help editors pick from the existing ones.

No goal from the product perspective to get rid of any tags just for the sake of minimizing the number.

jorgeorpinel commented 3 years ago

Ok did some basic consolidations in #2350. I have the instinct to clean up this kind of content organization but if they have no impact on SEO then probably no point in trying. But maybe the tag cloud should ignore single-use labels?

BTW should we repurpose this issue for that feature request? Thanks

rogermparent commented 3 years ago

But maybe the tag cloud should ignore single-use labels?

I can see this in a cloud, but we should probably have a list somewhere that shows off all Tags- they're there for a reason, and may help users navigate to relevant past articles.

BTW should we repurpose this issue for that feature request? Thanks

I'm of the school of "close often and make new tickets when needed" but that's no more than my personal opinion.

jorgeorpinel commented 3 years ago

Agreed. Moved to .#2356.

This one will be closed by #2350.