huggingface / dataset-viewer

Backend that powers the dataset viewer on Hugging Face dataset pages through a public API.
https://huggingface.co/docs/dataset-viewer
Apache License 2.0
688 stars 76 forks source link

Allow mnist and fashion mnist + remove canonical dataset logic #2880

Closed lhoestq closed 4 months ago

github-actions[bot] commented 4 months ago

ArgoCD Diff for commit 81abe49

Updated at 6/4/2024, 2:14:15 PM CEST

App: datasets-server-prod YAML generation: Success 🟢 App sync status: Synced ✅

```diff ===== apps/Deployment datasets-server/prod-datasets-server-admin ====== --- /tmp/argocd-diff4171182829/prod-datasets-server-admin-live.yaml 2024-06-04 12:14:13.576043867 +0000 +++ /tmp/argocd-diff4171182829/prod-datasets-server-admin 2024-06-04 12:14:13.572043871 +0000 @@ -410,7 +410,7 @@ - name: COMMON_BLOCKED_DATASETS value: open-llm-leaderboard/details_*,lunaluan/*,atom-in-the-universe/*,cot-leaderboard/cot-eval-traces,mitermix/yt-links,mcding-org/* - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST - value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}},hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas' + value: hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas,ylecun/mnist,zalando-datasets/fashion_mnist - name: COMMON_HF_ENDPOINT value: https://huggingface.co - name: HF_ENDPOINT ===== apps/Deployment datasets-server/prod-datasets-server-api ====== --- /tmp/argocd-diff2447395365/prod-datasets-server-api-live.yaml 2024-06-04 12:14:13.596043847 +0000 +++ /tmp/argocd-diff2447395365/prod-datasets-server-api 2024-06-04 12:14:13.588043855 +0000 @@ -411,7 +411,7 @@ - name: COMMON_BLOCKED_DATASETS value: open-llm-leaderboard/details_*,lunaluan/*,atom-in-the-universe/*,cot-leaderboard/cot-eval-traces,mitermix/yt-links,mcding-org/* - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST - value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}},hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas' + value: hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas,ylecun/mnist,zalando-datasets/fashion_mnist - name: COMMON_HF_ENDPOINT value: https://huggingface.co - name: HF_ENDPOINT ===== apps/Deployment datasets-server/prod-datasets-server-rows ====== --- /tmp/argocd-diff2949680195/prod-datasets-server-rows-live.yaml 2024-06-04 12:14:13.612043832 +0000 +++ /tmp/argocd-diff2949680195/prod-datasets-server-rows 2024-06-04 12:14:13.612043832 +0000 @@ -453,7 +453,7 @@ - name: COMMON_BLOCKED_DATASETS value: open-llm-leaderboard/details_*,lunaluan/*,atom-in-the-universe/*,cot-leaderboard/cot-eval-traces,mitermix/yt-links,mcding-org/* - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST - value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}},hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas' + value: hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas,ylecun/mnist,zalando-datasets/fashion_mnist - name: COMMON_HF_ENDPOINT value: https://huggingface.co - name: HF_ENDPOINT ===== apps/Deployment datasets-server/prod-datasets-server-search ====== --- /tmp/argocd-diff3831454576/prod-datasets-server-search-live.yaml 2024-06-04 12:14:13.632043813 +0000 +++ /tmp/argocd-diff3831454576/prod-datasets-server-search 2024-06-04 12:14:13.632043813 +0000 @@ -421,7 +421,7 @@ - name: COMMON_BLOCKED_DATASETS value: open-llm-leaderboard/details_*,lunaluan/*,atom-in-the-universe/*,cot-leaderboard/cot-eval-traces,mitermix/yt-links,mcding-org/* - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST - value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}},hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas' + value: hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas,ylecun/mnist,zalando-datasets/fashion_mnist - name: COMMON_HF_ENDPOINT value: https://huggingface.co - name: HF_ENDPOINT ===== apps/Deployment datasets-server/prod-datasets-server-sse-api ====== --- /tmp/argocd-diff1198517079/prod-datasets-server-sse-api-live.yaml 2024-06-04 12:14:13.648043797 +0000 +++ /tmp/argocd-diff1198517079/prod-datasets-server-sse-api 2024-06-04 12:14:13.644043801 +0000 @@ -275,7 +275,7 @@ - name: COMMON_BLOCKED_DATASETS value: open-llm-leaderboard/details_*,lunaluan/*,atom-in-the-universe/*,cot-leaderboard/cot-eval-traces,mitermix/yt-links,mcding-org/* - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST - value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}},hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas' + value: hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas,ylecun/mnist,zalando-datasets/fashion_mnist - name: COMMON_HF_ENDPOINT value: https://huggingface.co - name: HF_ENDPOINT ===== apps/Deployment datasets-server/prod-datasets-server-webhook ====== --- /tmp/argocd-diff388884614/prod-datasets-server-webhook-live.yaml 2024-06-04 12:14:13.668043777 +0000 +++ /tmp/argocd-diff388884614/prod-datasets-server-webhook 2024-06-04 12:14:13.664043781 +0000 @@ -390,7 +390,7 @@ - name: COMMON_BLOCKED_DATASETS value: open-llm-leaderboard/details_*,lunaluan/*,atom-in-the-universe/*,cot-leaderboard/cot-eval-traces,mitermix/yt-links,mcding-org/* - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST - value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}},hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas' + value: hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas,ylecun/mnist,zalando-datasets/fashion_mnist - name: COMMON_HF_ENDPOINT value: https://huggingface.co - name: HF_ENDPOINT ===== apps/Deployment datasets-server/prod-datasets-server-worker-heavy ====== --- /tmp/argocd-diff1651703709/prod-datasets-server-worker-heavy-live.yaml 2024-06-04 12:14:13.700043746 +0000 +++ /tmp/argocd-diff1651703709/prod-datasets-server-worker-heavy 2024-06-04 12:14:13.688043758 +0000 @@ -544,7 +544,7 @@ - name: COMMON_BLOCKED_DATASETS value: open-llm-leaderboard/details_*,lunaluan/*,atom-in-the-universe/*,cot-leaderboard/cot-eval-traces,mitermix/yt-links,mcding-org/* - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST - value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}},hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas' + value: hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas,ylecun/mnist,zalando-datasets/fashion_mnist - name: COMMON_HF_ENDPOINT value: https://huggingface.co - name: HF_ENDPOINT ===== apps/Deployment datasets-server/prod-datasets-server-worker-light ====== --- /tmp/argocd-diff3192477495/prod-datasets-server-worker-light-live.yaml 2024-06-04 12:14:13.724043722 +0000 +++ /tmp/argocd-diff3192477495/prod-datasets-server-worker-light 2024-06-04 12:14:13.720043726 +0000 @@ -543,7 +543,7 @@ - name: COMMON_BLOCKED_DATASETS value: open-llm-leaderboard/details_*,lunaluan/*,atom-in-the-universe/*,cot-leaderboard/cot-eval-traces,mitermix/yt-links,mcding-org/* - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST - value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}},hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas' + value: hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas,ylecun/mnist,zalando-datasets/fashion_mnist - name: COMMON_HF_ENDPOINT value: https://huggingface.co - name: HF_ENDPOINT ===== apps/Deployment datasets-server/prod-datasets-server-worker-medium ====== --- /tmp/argocd-diff819540082/prod-datasets-server-worker-medium-live.yaml 2024-06-04 12:14:13.748043699 +0000 +++ /tmp/argocd-diff819540082/prod-datasets-server-worker-medium 2024-06-04 12:14:13.744043703 +0000 @@ -543,7 +543,7 @@ - name: COMMON_BLOCKED_DATASETS value: open-llm-leaderboard/details_*,lunaluan/*,atom-in-the-universe/*,cot-leaderboard/cot-eval-traces,mitermix/yt-links,mcding-org/* - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST - value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}},hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas' + value: hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas,ylecun/mnist,zalando-datasets/fashion_mnist - name: COMMON_HF_ENDPOINT value: https://huggingface.co - name: HF_ENDPOINT ===== batch/CronJob datasets-server/prod-datasets-server-job-backfill ====== --- /tmp/argocd-diff2090293280/prod-datasets-server-job-backfill-live.yaml 2024-06-04 12:14:13.776043672 +0000 +++ /tmp/argocd-diff2090293280/prod-datasets-server-job-backfill 2024-06-04 12:14:13.772043676 +0000 @@ -216,7 +216,7 @@ - name: COMMON_BLOCKED_DATASETS value: open-llm-leaderboard/details_*,lunaluan/*,atom-in-the-universe/*,cot-leaderboard/cot-eval-traces,mitermix/yt-links,mcding-org/* - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST - value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}},hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas' + value: hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas,ylecun/mnist,zalando-datasets/fashion_mnist - name: COMMON_HF_ENDPOINT value: https://huggingface.co - name: HF_ENDPOINT ===== batch/CronJob datasets-server/prod-datasets-server-job-backfill-retryable-errors ====== --- /tmp/argocd-diff610428271/prod-datasets-server-job-backfill-retryable-errors-live.yaml 2024-06-04 12:14:13.784043664 +0000 +++ /tmp/argocd-diff610428271/prod-datasets-server-job-backfill-retryable-errors 2024-06-04 12:14:13.784043664 +0000 @@ -216,7 +216,7 @@ - name: COMMON_BLOCKED_DATASETS value: open-llm-leaderboard/details_*,lunaluan/*,atom-in-the-universe/*,cot-leaderboard/cot-eval-traces,mitermix/yt-links,mcding-org/* - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST - value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}},hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas' + value: hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas,ylecun/mnist,zalando-datasets/fashion_mnist - name: COMMON_HF_ENDPOINT value: https://huggingface.co - name: HF_ENDPOINT ===== batch/CronJob datasets-server/prod-datasets-server-job-cache-metrics-collector ====== --- /tmp/argocd-diff2406304008/prod-datasets-server-job-cache-metrics-collector-live.yaml 2024-06-04 12:14:13.792043656 +0000 +++ /tmp/argocd-diff2406304008/prod-datasets-server-job-cache-metrics-collector 2024-06-04 12:14:13.792043656 +0000 @@ -176,7 +176,7 @@ - name: COMMON_BLOCKED_DATASETS value: open-llm-leaderboard/details_*,lunaluan/*,atom-in-the-universe/*,cot-leaderboard/cot-eval-traces,mitermix/yt-links,mcding-org/* - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST - value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}},hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas' + value: hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas,ylecun/mnist,zalando-datasets/fashion_mnist - name: COMMON_HF_ENDPOINT value: https://huggingface.co - name: HF_ENDPOINT ===== batch/CronJob datasets-server/prod-datasets-server-job-post-messages ====== --- /tmp/argocd-diff1400552623/prod-datasets-server-job-post-messages-live.yaml 2024-06-04 12:14:13.804043644 +0000 +++ /tmp/argocd-diff1400552623/prod-datasets-server-job-post-messages 2024-06-04 12:14:13.804043644 +0000 @@ -187,7 +187,7 @@ - name: COMMON_BLOCKED_DATASETS value: open-llm-leaderboard/details_*,lunaluan/*,atom-in-the-universe/*,cot-leaderboard/cot-eval-traces,mitermix/yt-links,mcding-org/* - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST - value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}},hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas' + value: hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas,ylecun/mnist,zalando-datasets/fashion_mnist - name: COMMON_HF_ENDPOINT value: https://huggingface.co - name: HF_ENDPOINT ===== batch/CronJob datasets-server/prod-datasets-server-job-queue-metrics-collector ====== --- /tmp/argocd-diff1109974285/prod-datasets-server-job-queue-metrics-collector-live.yaml 2024-06-04 12:14:13.812043637 +0000 +++ /tmp/argocd-diff1109974285/prod-datasets-server-job-queue-metrics-collector 2024-06-04 12:14:13.812043637 +0000 @@ -176,7 +176,7 @@ - name: COMMON_BLOCKED_DATASETS value: open-llm-leaderboard/details_*,lunaluan/*,atom-in-the-universe/*,cot-leaderboard/cot-eval-traces,mitermix/yt-links,mcding-org/* - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST - value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}},hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas' + value: hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas,ylecun/mnist,zalando-datasets/fashion_mnist - name: COMMON_HF_ENDPOINT value: https://huggingface.co - name: HF_ENDPOINT ```

App: datasets-server-staging YAML generation: Success 🟢 App sync status: Synced ✅

```diff ===== apps/Deployment datasets-server/staging-datasets-server-admin ====== --- /tmp/argocd-diff2291524494/staging-datasets-server-admin-live.yaml 2024-06-04 12:14:15.320042165 +0000 +++ /tmp/argocd-diff2291524494/staging-datasets-server-admin 2024-06-04 12:14:15.320042165 +0000 @@ -402,7 +402,7 @@ - name: COMMON_BLOCKED_DATASETS value: open-llm-leaderboard/details_*,lunaluan/*,atom-in-the-universe/*,cot-leaderboard/cot-eval-traces,mitermix/yt-links,mcding-org/* - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST - value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}},hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas' + value: hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas,ylecun/mnist,zalando-datasets/fashion_mnist - name: COMMON_HF_ENDPOINT value: https://huggingface.co - name: HF_ENDPOINT ===== apps/Deployment datasets-server/staging-datasets-server-api ====== --- /tmp/argocd-diff3331785039/staging-datasets-server-api-live.yaml 2024-06-04 12:14:15.344042142 +0000 +++ /tmp/argocd-diff3331785039/staging-datasets-server-api 2024-06-04 12:14:15.340042146 +0000 @@ -399,7 +399,7 @@ - name: COMMON_BLOCKED_DATASETS value: open-llm-leaderboard/details_*,lunaluan/*,atom-in-the-universe/*,cot-leaderboard/cot-eval-traces,mitermix/yt-links,mcding-org/* - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST - value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}},hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas' + value: hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas,ylecun/mnist,zalando-datasets/fashion_mnist - name: COMMON_HF_ENDPOINT value: https://huggingface.co - name: HF_ENDPOINT ===== apps/Deployment datasets-server/staging-datasets-server-rows ====== --- /tmp/argocd-diff2540011939/staging-datasets-server-rows-live.yaml 2024-06-04 12:14:15.360042126 +0000 +++ /tmp/argocd-diff2540011939/staging-datasets-server-rows 2024-06-04 12:14:15.356042130 +0000 @@ -463,7 +463,7 @@ - name: COMMON_BLOCKED_DATASETS value: open-llm-leaderboard/details_*,lunaluan/*,atom-in-the-universe/*,cot-leaderboard/cot-eval-traces,mitermix/yt-links,mcding-org/* - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST - value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}},hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas' + value: hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas,ylecun/mnist,zalando-datasets/fashion_mnist - name: COMMON_HF_ENDPOINT value: https://huggingface.co - name: HF_ENDPOINT ===== apps/Deployment datasets-server/staging-datasets-server-search ====== --- /tmp/argocd-diff1481795220/staging-datasets-server-search-live.yaml 2024-06-04 12:14:15.380042107 +0000 +++ /tmp/argocd-diff1481795220/staging-datasets-server-search 2024-06-04 12:14:15.380042107 +0000 @@ -430,7 +430,7 @@ - name: COMMON_BLOCKED_DATASETS value: open-llm-leaderboard/details_*,lunaluan/*,atom-in-the-universe/*,cot-leaderboard/cot-eval-traces,mitermix/yt-links,mcding-org/* - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST - value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}},hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas' + value: hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas,ylecun/mnist,zalando-datasets/fashion_mnist - name: COMMON_HF_ENDPOINT value: https://huggingface.co - name: HF_ENDPOINT ===== apps/Deployment datasets-server/staging-datasets-server-sse-api ====== --- /tmp/argocd-diff2457585616/staging-datasets-server-sse-api-live.yaml 2024-06-04 12:14:15.392042095 +0000 +++ /tmp/argocd-diff2457585616/staging-datasets-server-sse-api 2024-06-04 12:14:15.392042095 +0000 @@ -283,7 +283,7 @@ - name: COMMON_BLOCKED_DATASETS value: open-llm-leaderboard/details_*,lunaluan/*,atom-in-the-universe/*,cot-leaderboard/cot-eval-traces,mitermix/yt-links,mcding-org/* - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST - value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}},hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas' + value: hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas,ylecun/mnist,zalando-datasets/fashion_mnist - name: COMMON_HF_ENDPOINT value: https://huggingface.co - name: HF_ENDPOINT ===== apps/Deployment datasets-server/staging-datasets-server-webhook ====== --- /tmp/argocd-diff2984131237/staging-datasets-server-webhook-live.yaml 2024-06-04 12:14:15.412042075 +0000 +++ /tmp/argocd-diff2984131237/staging-datasets-server-webhook 2024-06-04 12:14:15.412042075 +0000 @@ -385,7 +385,7 @@ - name: COMMON_BLOCKED_DATASETS value: open-llm-leaderboard/details_*,lunaluan/*,atom-in-the-universe/*,cot-leaderboard/cot-eval-traces,mitermix/yt-links,mcding-org/* - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST - value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}},hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas' + value: hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas,ylecun/mnist,zalando-datasets/fashion_mnist - name: COMMON_HF_ENDPOINT value: https://huggingface.co - name: HF_ENDPOINT ===== apps/Deployment datasets-server/staging-datasets-server-worker-all ====== --- /tmp/argocd-diff2421714224/staging-datasets-server-worker-all-live.yaml 2024-06-04 12:14:15.440042048 +0000 +++ /tmp/argocd-diff2421714224/staging-datasets-server-worker-all 2024-06-04 12:14:15.436042052 +0000 @@ -541,7 +541,7 @@ - name: COMMON_BLOCKED_DATASETS value: open-llm-leaderboard/details_*,lunaluan/*,atom-in-the-universe/*,cot-leaderboard/cot-eval-traces,mitermix/yt-links,mcding-org/* - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST - value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}},hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas' + value: hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas,ylecun/mnist,zalando-datasets/fashion_mnist - name: COMMON_HF_ENDPOINT value: https://huggingface.co - name: HF_ENDPOINT ===== apps/Deployment datasets-server/staging-datasets-server-worker-light ====== --- /tmp/argocd-diff2719228446/staging-datasets-server-worker-light-live.yaml 2024-06-04 12:14:15.468042020 +0000 +++ /tmp/argocd-diff2719228446/staging-datasets-server-worker-light 2024-06-04 12:14:15.460042028 +0000 @@ -541,7 +541,7 @@ - name: COMMON_BLOCKED_DATASETS value: open-llm-leaderboard/details_*,lunaluan/*,atom-in-the-universe/*,cot-leaderboard/cot-eval-traces,mitermix/yt-links,mcding-org/* - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST - value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}},hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas' + value: hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas,ylecun/mnist,zalando-datasets/fashion_mnist - name: COMMON_HF_ENDPOINT value: https://huggingface.co - name: HF_ENDPOINT ===== batch/CronJob datasets-server/staging-datasets-server-job-cache-metrics-collector ====== --- /tmp/argocd-diff4180596451/staging-datasets-server-job-cache-metrics-collector-live.yaml 2024-06-04 12:14:15.484042005 +0000 +++ /tmp/argocd-diff4180596451/staging-datasets-server-job-cache-metrics-collector 2024-06-04 12:14:15.484042005 +0000 @@ -173,7 +173,7 @@ - name: COMMON_BLOCKED_DATASETS value: open-llm-leaderboard/details_*,lunaluan/*,atom-in-the-universe/*,cot-leaderboard/cot-eval-traces,mitermix/yt-links,mcding-org/* - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST - value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}},hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas' + value: hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas,ylecun/mnist,zalando-datasets/fashion_mnist - name: COMMON_HF_ENDPOINT value: https://huggingface.co - name: HF_ENDPOINT ===== batch/CronJob datasets-server/staging-datasets-server-job-post-messages ====== --- /tmp/argocd-diff3311791557/staging-datasets-server-job-post-messages-live.yaml 2024-06-04 12:14:15.500041989 +0000 +++ /tmp/argocd-diff3311791557/staging-datasets-server-job-post-messages 2024-06-04 12:14:15.492041997 +0000 @@ -185,7 +185,7 @@ - name: COMMON_BLOCKED_DATASETS value: open-llm-leaderboard/details_*,lunaluan/*,atom-in-the-universe/*,cot-leaderboard/cot-eval-traces,mitermix/yt-links,mcding-org/* - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST - value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}},hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas' + value: hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas,ylecun/mnist,zalando-datasets/fashion_mnist - name: COMMON_HF_ENDPOINT value: https://huggingface.co - name: HF_ENDPOINT ===== batch/CronJob datasets-server/staging-datasets-server-job-queue-metrics-collector ====== --- /tmp/argocd-diff1058340267/staging-datasets-server-job-queue-metrics-collector-live.yaml 2024-06-04 12:14:15.508041981 +0000 +++ /tmp/argocd-diff1058340267/staging-datasets-server-job-queue-metrics-collector 2024-06-04 12:14:15.508041981 +0000 @@ -174,7 +174,7 @@ - name: COMMON_BLOCKED_DATASETS value: open-llm-leaderboard/details_*,lunaluan/*,atom-in-the-universe/*,cot-leaderboard/cot-eval-traces,mitermix/yt-links,mcding-org/* - name: COMMON_DATASET_SCRIPTS_ALLOW_LIST - value: '{{ALL_DATASETS_WITH_NO_NAMESPACE}},hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas' + value: hf-internal-testing/dataset_with_script,togethercomputer/RedPajama-Data-1T,togethercomputer/RedPajama-Data-V2,gaia-benchmark/GAIA,poloclub/diffusiondb,mozilla-foundation/common_voice_*,google/fleurs,speechcolab/gigaspeech,espnet/yodas,ylecun/mnist,zalando-datasets/fashion_mnist - name: COMMON_HF_ENDPOINT value: https://huggingface.co - name: HF_ENDPOINT ```

Legend Status
The app is synced in ArgoCD, and diffs you see are solely from this PR.
⚠️ The app is out-of-sync in ArgoCD, and the diffs you see include those changes plus any from this PR.
🛑 There was an error generating the ArgoCD diffs due to changes in this PR.
severo commented 4 months ago

about "remove canonical dataset logic": we possibly have other places in the code where we handle canonical repos. Maybe in another PR?