Closed severo closed 9 months ago
Trying to see if there are other occurrences with:
db.cachedResponsesBlue.aggregate([
{$group: {
_id: "$dataset",
count: {$sum: 1}
}},
{$match: {count: 1}}
])
Not that much (60 datasets):
{ _id: 'speed1/nattan', count: 1 }
{ _id: 'wikimedia/wikipedia', count: 1 }
{ _id: 'bbaw_egyptian', count: 1 }
{ _id: 'joey234/mmlu-electrical_engineering-neg-prepend-verbal', count: 1 }
{ _id: 'imdatta0/ultrachat_1k', count: 1 }
{ _id: 'CyberHarem/fang_arknights', count: 1 }
{ _id: 'focia/private_instagram', count: 1 }
{ _id: 'DeepFoldProtein/CATH_v4.3_S35_processed_512_test', count: 1 }
{ _id: 'Vivek1234321/multi-cloud-train', count: 1 }
{ _id: 'fu1995/shuimo-image-dataset', count: 1 }
{ _id: 'adi-kmt/airoboros-3.2_kn', count: 1 }
{ _id: 'Gbssreejith/death_type_42_dataset', count: 1 }
{ _id: 'GunA-SD/DataX', count: 1 }
{ _id: '0x7194633/persona-data-v1', count: 1 }
{ _id: 'Kalfrin/edset', count: 1 }
{ _id: 'Denilsonic/Samples', count: 1 }
{ _id: 'barolr/text_am-sum', count: 1 }
{ _id: 'sergei202/nexus-function-calling', count: 1 }
{ _id: 'xwjzds/pretrain_repeat_paraphrase', count: 1 }
{ _id: 'chunping-hf/my_audio', count: 1 }
{ _id: 'buzzcraft/ELI5-NO', count: 1 }
{ _id: 'racheltong/VA_test1', count: 1 }
{ _id: 'evanfrick/human_eval', count: 1 }
{ _id: 'Toastmachine/Pinescript-test', count: 1 }
{ _id: 'gowitheflowlab/parallel-pt-nl-pl', count: 1 }
{ _id: 'DeepFoldProtein/SCOP-1.65_processed_512', count: 1 }
{ _id: 'yn01/test_20240109_01', count: 1 }
{ _id: 'MysticMss/EUVOZ', count: 1 }
{ _id: 'Mashengshuaiqi/myfirstdataset', count: 1 }
{ _id: 'manishiitg/en-hi-raw', count: 1 }
{ _id: 'wjwow/FreeMan', count: 1 }
{ _id: 'thanhtlx/test-fix-cmg-time-split', count: 1 }
{ _id: 'yimingzhang/uf_safe_v1', count: 1 }
{ _id: 'WJYBUPT/law_item', count: 1 }
{ _id: 'iwasjohnlennon/JayAraeEssexArchive', count: 1 }
{ _id: 'causalnlp/corr2cause', count: 1 }
{ _id: 'htryj/instruction', count: 1 }
{ _id: 'xwjzds/paraphrase_collections_enhanced', count: 1 }
{ _id: 'cj-mills/cvat-instance-segmentation-toy-dataset', count: 1 }
{ _id: 'Shakib75/cpp-programs', count: 1 }
{ _id: 'Crysiss/lawdataset', count: 1 }
{ _id: 'ayushtues/instaflow_images', count: 1 }
{ _id: 'porkuaranha/joba', count: 1 }
{ _id: 'phamtungthuy/cauhoiphapluat', count: 1 }
{ _id: 'azrai99/data-scientist-jobstreet-dataset', count: 1 }
{ _id: 'mark434/combined', count: 1 }
{ _id: 'jksheth/r_j5', count: 1 }
{ _id: 'reciprocate/pku_safer_dpo_pairs', count: 1 }
{ _id: 'version-control/ds-lib-extract-1m', count: 1 }
{ _id: 'casecrit/2024-indonesian-election', count: 1 }
{ _id: 'SofiaVouzika/test-liver', count: 1 }
{ _id: 'senhorsapo/vanelope', count: 1 }
{ _id: 'AshanGimhana/Testingdata', count: 1 }
{ _id: 'focia/image_shot_dataset', count: 1 }
{ _id: 'VietAI/vi_mednli', count: 1 }
{ _id: 'philschmid/trl-test-instruction', count: 1 }
{ _id: 'feazer/nva-WRSPGR', count: 1 }
{ _id: 'Berzerker/gnhk_ocr_dataset', count: 1 }
{ _id: 'razent/vi_pubmed_small', count: 1 }
{ _id: 'Recag/Rp_C4_50', count: 1 }
All have one entry only for the first step, saving one which has only one entry for dataset-split-names
db.cachedResponsesBlue.aggregate([
{$match: {
dataset: {$in: ['speed1/nattan','wikimedia/wikipedia','bbaw_egyptian','joey234/mmlu-electrical_engineering-neg-prepend-verbal','imdatta0/ultrachat_1k','CyberHarem/fang_arknights','focia/private_instagram','DeepFoldProtein/CATH_v4.3_S35_processed_512_test','Vivek1234321/multi-cloud-train','fu1995/shuimo-image-dataset','adi-kmt/airoboros-3.2_kn','Gbssreejith/death_type_42_dataset','GunA-SD/DataX','0x7194633/persona-data-v1','Kalfrin/edset','Denilsonic/Samples','barolr/text_am-sum','sergei202/nexus-function-calling','xwjzds/pretrain_repeat_paraphrase','chunping-hf/my_audio','buzzcraft/ELI5-NO','racheltong/VA_test1','evanfrick/human_eval','Toastmachine/Pinescript-test','gowitheflowlab/parallel-pt-nl-pl','DeepFoldProtein/SCOP-1.65_processed_512','yn01/test_20240109_01','MysticMss/EUVOZ','Mashengshuaiqi/myfirstdataset','manishiitg/en-hi-raw','wjwow/FreeMan','thanhtlx/test-fix-cmg-time-split','yimingzhang/uf_safe_v1','WJYBUPT/law_item','iwasjohnlennon/JayAraeEssexArchive','causalnlp/corr2cause','htryj/instruction','xwjzds/paraphrase_collections_enhanced','cj-mills/cvat-instance-segmentation-toy-dataset','Shakib75/cpp-programs','Crysiss/lawdataset','ayushtues/instaflow_images','porkuaranha/joba','phamtungthuy/cauhoiphapluat','azrai99/data-scientist-jobstreet-dataset','mark434/combined','jksheth/r_j5','reciprocate/pku_safer_dpo_pairs','version-control/ds-lib-extract-1m','casecrit/2024-indonesian-election','SofiaVouzika/test-liver','senhorsapo/vanelope','AshanGimhana/Testingdata','focia/image_shot_dataset','VietAI/vi_mednli','philschmid/trl-test-instruction','feazer/nva-WRSPGR','Berzerker/gnhk_ocr_dataset','razent/vi_pubmed_small','Recag/Rp_C4_50']}
}},
{$group: {
_id: '$kind',
count: {$sum: 1}}
}
])
{ _id: 'dataset-config-names', count: 59 }
{ _id: 'dataset-split-names', count: 1 }
The exception is feazer/nva-WRSPGR
:
db.cachedResponsesBlue.find({
dataset: {$in: ['speed1/nattan','wikimedia/wikipedia','bbaw_egyptian','joey234/mmlu-electrical_engineering-neg-prepend-verbal','imdatta0/ultrachat_1k','CyberHarem/fang_arknights','focia/private_instagram','DeepFoldProtein/CATH_v4.3_S35_processed_512_test','Vivek1234321/multi-cloud-train','fu1995/shuimo-image-dataset','adi-kmt/airoboros-3.2_kn','Gbssreejith/death_type_42_dataset','GunA-SD/DataX','0x7194633/persona-data-v1','Kalfrin/edset','Denilsonic/Samples','barolr/text_am-sum','sergei202/nexus-function-calling','xwjzds/pretrain_repeat_paraphrase','chunping-hf/my_audio','buzzcraft/ELI5-NO','racheltong/VA_test1','evanfrick/human_eval','Toastmachine/Pinescript-test','gowitheflowlab/parallel-pt-nl-pl','DeepFoldProtein/SCOP-1.65_processed_512','yn01/test_20240109_01','MysticMss/EUVOZ','Mashengshuaiqi/myfirstdataset','manishiitg/en-hi-raw','wjwow/FreeMan','thanhtlx/test-fix-cmg-time-split','yimingzhang/uf_safe_v1','WJYBUPT/law_item','iwasjohnlennon/JayAraeEssexArchive','causalnlp/corr2cause','htryj/instruction','xwjzds/paraphrase_collections_enhanced','cj-mills/cvat-instance-segmentation-toy-dataset','Shakib75/cpp-programs','Crysiss/lawdataset','ayushtues/instaflow_images','porkuaranha/joba','phamtungthuy/cauhoiphapluat','azrai99/data-scientist-jobstreet-dataset','mark434/combined','jksheth/r_j5','reciprocate/pku_safer_dpo_pairs','version-control/ds-lib-extract-1m','casecrit/2024-indonesian-election','SofiaVouzika/test-liver','senhorsapo/vanelope','AshanGimhana/Testingdata','focia/image_shot_dataset','VietAI/vi_mednli','philschmid/trl-test-instruction','feazer/nva-WRSPGR','Berzerker/gnhk_ocr_dataset','razent/vi_pubmed_small','Recag/Rp_C4_50']},
kind: "dataset-split-names"
}, {dataset: 1, updated_at: 1})
{ _id: ObjectId("659d85dc137f88fd4461b89b"),
dataset: 'feazer/nva-WRSPGR',
updated_at: 2024-01-09T17:43:56.666Z }
The entries were created between 2024-01-09T17:43
and 2024-01-10T20:07
. It's somewhat old. Let's refresh all of them, and look in the next days if it appears again.
db.cachedResponsesBlue.aggregate([
{$match: {
dataset: {$in: ['speed1/nattan','wikimedia/wikipedia','bbaw_egyptian','joey234/mmlu-electrical_engineering-neg-prepend-verbal','imdatta0/ultrachat_1k','CyberHarem/fang_arknights','focia/private_instagram','DeepFoldProtein/CATH_v4.3_S35_processed_512_test','Vivek1234321/multi-cloud-train','fu1995/shuimo-image-dataset','adi-kmt/airoboros-3.2_kn','Gbssreejith/death_type_42_dataset','GunA-SD/DataX','0x7194633/persona-data-v1','Kalfrin/edset','Denilsonic/Samples','barolr/text_am-sum','sergei202/nexus-function-calling','xwjzds/pretrain_repeat_paraphrase','chunping-hf/my_audio','buzzcraft/ELI5-NO','racheltong/VA_test1','evanfrick/human_eval','Toastmachine/Pinescript-test','gowitheflowlab/parallel-pt-nl-pl','DeepFoldProtein/SCOP-1.65_processed_512','yn01/test_20240109_01','MysticMss/EUVOZ','Mashengshuaiqi/myfirstdataset','manishiitg/en-hi-raw','wjwow/FreeMan','thanhtlx/test-fix-cmg-time-split','yimingzhang/uf_safe_v1','WJYBUPT/law_item','iwasjohnlennon/JayAraeEssexArchive','causalnlp/corr2cause','htryj/instruction','xwjzds/paraphrase_collections_enhanced','cj-mills/cvat-instance-segmentation-toy-dataset','Shakib75/cpp-programs','Crysiss/lawdataset','ayushtues/instaflow_images','porkuaranha/joba','phamtungthuy/cauhoiphapluat','azrai99/data-scientist-jobstreet-dataset','mark434/combined','jksheth/r_j5','reciprocate/pku_safer_dpo_pairs','version-control/ds-lib-extract-1m','casecrit/2024-indonesian-election','SofiaVouzika/test-liver','senhorsapo/vanelope','AshanGimhana/Testingdata','focia/image_shot_dataset','VietAI/vi_mednli','philschmid/trl-test-instruction','feazer/nva-WRSPGR','Berzerker/gnhk_ocr_dataset','razent/vi_pubmed_small','Recag/Rp_C4_50']}
}},
{$group: {
_id: 'dates',
first: {$min: '$updated_at'},
last: {$max: '$updated_at'},
}}
])
{ _id: 'dates',
first: 2024-01-09T17:43:56.666Z,
last: 2024-01-10T20:07:49.555Z }
Refreshing with:
HF_TOKEN=...
DATASETS=(speed1/nattan wikimedia/wikipedia bbaw_egyptian joey234/mmlu-electrical_engineering-neg-prepend-verbal imdatta0/ultrachat_1k CyberHarem/fang_arknights focia/private_instagram DeepFoldProtein/CATH_v4.3_S35_processed_512_test Vivek1234321/multi-cloud-train fu1995/shuimo-image-dataset adi-kmt/airoboros-3.2_kn Gbssreejith/death_type_42_dataset GunA-SD/DataX 0x7194633/persona-data-v1 Kalfrin/edset Denilsonic/Samples barolr/text_am-sum sergei202/nexus-function-calling xwjzds/pretrain_repeat_paraphrase chunping-hf/my_audio buzzcraft/ELI5-NO racheltong/VA_test1 evanfrick/human_eval Toastmachine/Pinescript-test gowitheflowlab/parallel-pt-nl-pl DeepFoldProtein/SCOP-1.65_processed_512 yn01/test_20240109_01 MysticMss/EUVOZ Mashengshuaiqi/myfirstdataset manishiitg/en-hi-raw wjwow/FreeMan thanhtlx/test-fix-cmg-time-split yimingzhang/uf_safe_v1 WJYBUPT/law_item iwasjohnlennon/JayAraeEssexArchive causalnlp/corr2cause htryj/instruction xwjzds/paraphrase_collections_enhanced cj-mills/cvat-instance-segmentation-toy-dataset Shakib75/cpp-programs Crysiss/lawdataset ayushtues/instaflow_images porkuaranha/joba phamtungthuy/cauhoiphapluat azrai99/data-scientist-jobstreet-dataset mark434/combined jksheth/r_j5 reciprocate/pku_safer_dpo_pairs version-control/ds-lib-extract-1m casecrit/2024-indonesian-election SofiaVouzika/test-liver senhorsapo/vanelope AshanGimhana/Testingdata focia/image_shot_dataset VietAI/vi_mednli philschmid/trl-test-instruction feazer/nva-WRSPGR Berzerker/gnhk_ocr_dataset razent/vi_pubmed_small Recag/Rp_C4_50)
for dataset in ${DATASETS[@]}; do curl -H "Authorization: Bearer $HF_TOKEN" -X POST https://datasets-server.huggingface.co/admin/force-refresh/dataset-config-names\?dataset\=$dataset\&priority\=low ; done;
It worked for https://huggingface.co/datasets/causalnlp/corr2cause.
I ran it again:
db.cachedResponsesBlue.aggregate([
{$group: {
_id: "$dataset",
count: {$sum: 1}
}},
{$match: {count: 1}}
])
{ _id: 'asakara/b', count: 1 }
{ _id: 'jbilcke-hf/ai-tube-index', count: 1 }
{ _id: 'feazer/nva-WRSPGR', count: 1 }
db.cachedResponsesBlue.aggregate([
{$match: {
dataset: {$in: ['asakara/b', 'jbilcke-hf/ai-tube-index', 'feazer/nva-WRSPGR']}
}},
{$group: {
_id: 'dates',
first: {$min: '$updated_at'},
last: {$max: '$updated_at'},
}}
])
{ _id: 'dates',
first: 2024-01-09T17:43:56.666Z,
last: 2024-01-11T11:31:35.885Z }
No job for these datasets:
db.jobsBlue.find({dataset: {$in: ['asakara/b', 'jbilcke-hf/ai-tube-index', 'feazer/nva-WRSPGR']}})
I tried to recreate them manually (admin UI):
asakara/b
and feazer/nva-WRSPGR
: deleted the cache entry, because the dataset does not exist anymore.jbilcke-hf/ai-tube-index
: recreated without an issue.So: no more cases are reported at the moment.
db.cachedResponsesBlue.aggregate([
{$group: {
_id: "$dataset",
count: {$sum: 1}
}},
{$match: {count: 1}}
])
Today, no occurrence:
db.cachedResponsesBlue.aggregate([
{$group: {
_id: "$dataset",
count: {$sum: 1}
}},
{$match: {count: 1}}
])
Also reported here: https://huggingface.co/datasets/ayymen/Weblate-Translations/discussions/1
Current occurrences:
db.cachedResponsesBlue.aggregate([
{$group: {
_id: "$dataset",
count: {$sum: 1}
}},
{$match: {count: 1}}
])
{ _id: 'CyberHarem/golden_hind_azurlane', count: 1 }
{ _id: 'kenhktsui/open-toolformer-retrieval-multi-neg-result-new-kw', count: 1 }
{ _id: 'CyberHarem/miyu_edelfelt_fgo', count: 1 }
{ _id: 'cutterd/gelgen_tar_29', count: 1 }
{ _id: 'Leogrin/real-toxicity-prompts_first_5K', count: 1 }
{ _id: 'CyberHarem/ak_47_girlsfrontline', count: 1 }
As of today:
db.cachedResponsesBlue.aggregate([
{$group: {
_id: "$dataset",
count: {$sum: 1}
}},
{$match: {count: 1}}
])
{ _id: 'cutterd/gelgen_tar_29', count: 1 }
{ _id: 'CyberHarem/miyu_edelfelt_fgo', count: 1 }
{ _id: 'CyberHarem/golden_hind_azurlane', count: 1 }
{ _id: 'kenhktsui/open-toolformer-retrieval-multi-neg-result-new-kw', count: 1 }
{ _id: 'Leogrin/real-toxicity-prompts_first_5K', count: 1 }
{ _id: 'Recag/Rg_CommonC_234', count: 1 }
{ _id: 'CyberHarem/roma_kantaicollection', count: 1 }
{ _id: 'CyberHarem/ak_47_girlsfrontline', count: 1 }
Two new ones: Recag/Rg_CommonC_234
and CyberHarem/roma_kantaicollection
, and the existing ones have not been fixed by the backfill cronjob.
Today:
db.cachedResponsesBlue.aggregate([
{$group: {
_id: "$dataset",
count: {$sum: 1}
}},
{$match: {count: 1}}
])
{ _id: 'red_caps', count: 1 }
{ _id: 'Recag/Rp_CommonC_241', count: 1 }
{ _id: 'arbml/alpagasus_cleaned_ar_reviewed_v4', count: 1 }
{ _id: 'anandhuvasudev/guanaco-llama2-1k', count: 1 }
{ _id: 'CyberHarem/ak_47_girlsfrontline', count: 1 }
{ _id: 'hkust-nlp/agentboard', count: 1 }
{ _id: 'anandhuvasudev/southindiandish', count: 1 }
{ _id: '203427as321/articles', count: 1 }
{ _id: 'cdt', count: 1 }
{ _id: 'malucoelhaofc/NathanPortuguese', count: 1 }
{ _id: 'CyberHarem/roma_kantaicollection', count: 1 }
{ _id: 'asgaardlab/GamePhysicsDailyDump', count: 1 }
{ _id: 'GaJoPrograma/datasetVictoriaUNADGenericoDuplicados', count: 1 }
{ _id: 'YANG-Cheng/ab', count: 1 }
{ _id: 'oknerazan/english_sentences', count: 1 }
{ _id: 'Benchmbn/example1', count: 1 }
{ _id: 'Leogrin/real-toxicity-prompts_first_5K', count: 1 }
{ _id: 'DucHaiten/all-in', count: 1 }
{ _id: 'uyentk/thucuc_data', count: 1 }
{ _id: 'openclimatefix/dwd-icon-global', count: 1 }
{ _id: 'giux78/ultrafeedback-binarized-preferences-cleaned-ita-ready', count: 1 }
{ _id: 'jacobbieker/himawari9-kerchunk', count: 1 }
{ _id: 'openclimatefix/eumetsat-iodc', count: 1 }
{ _id: 'cutterd/gelgen_tar_29', count: 1 }
{ _id: 'Recag/Rp_CommonC_355', count: 1 }
{ _id: 'cedr', count: 1 }
{ _id: 'jacobbieker/eumetsat-iodc', count: 1 }
{ _id: 'Recag/Rp_CommonC_520', count: 1 }
{ _id: 'hf-doc-build/doc-build', count: 1 }
{ _id: 'CyberHarem/miyu_edelfelt_fgo', count: 1 }
{ _id: 'CyberHarem/golden_hind_azurlane', count: 1 }
{ _id: 'kenhktsui/open-toolformer-retrieval-multi-neg-result-new-kw', count: 1 }
{ _id: 'x_stance', count: 1 }
33 datasets
But we currently have a lot of pending jobs, so, it might be the reason.
Checking if some of them don't have a job (if they have jobs, we only have to wait):
use datasets_server_queue
db.jobsBlue.aggregate([
{$match: {dataset: {$in: ['red_caps','Recag/Rp_CommonC_241','arbml/alpagasus_cleaned_ar_reviewed_v4','anandhuvasudev/guanaco-llama2-1k','CyberHarem/ak_47_girlsfrontline','hkust-nlp/agentboard','anandhuvasudev/southindiandish','203427as321/articles','cdt','malucoelhaofc/NathanPortuguese','CyberHarem/roma_kantaicollection','asgaardlab/GamePhysicsDailyDump','GaJoPrograma/datasetVictoriaUNADGenericoDuplicados','YANG-Cheng/ab','oknerazan/english_sentences','Benchmbn/example1','Leogrin/real-toxicity-prompts_first_5K','DucHaiten/all-in','uyentk/thucuc_data','openclimatefix/dwd-icon-global','giux78/ultrafeedback-binarized-preferences-cleaned-ita-ready','jacobbieker/himawari9-kerchunk','openclimatefix/eumetsat-iodc','cutterd/gelgen_tar_29','Recag/Rp_CommonC_355','cedr','jacobbieker/eumetsat-iodc','Recag/Rp_CommonC_520','hf-doc-build/doc-build','CyberHarem/miyu_edelfelt_fgo','CyberHarem/golden_hind_azurlane','kenhktsui/open-toolformer-retrieval-multi-neg-result-new-kw','x_stance']}}},
{$group: {
_id: "$dataset",
count: {$sum: 1}
}}
])
{ _id: 'giux78/ultrafeedback-binarized-preferences-cleaned-ita-ready',
count: 13 }
{ _id: 'hf-doc-build/doc-build', count: 6 }
{ _id: 'oknerazan/english_sentences', count: 8 }
{ _id: 'uyentk/thucuc_data', count: 41 }
{ _id: 'GaJoPrograma/datasetVictoriaUNADGenericoDuplicados',
count: 8 }
{ _id: 'cdt', count: 8 }
{ _id: 'Benchmbn/example1', count: 9 }
{ _id: 'openclimatefix/dwd-icon-global', count: 1 }
{ _id: 'Recag/Rp_CommonC_355', count: 3 }
{ _id: 'malucoelhaofc/NathanPortuguese', count: 8 }
{ _id: 'Recag/Rp_CommonC_520', count: 8 }
{ _id: 'cedr', count: 10 }
{ _id: 'hkust-nlp/agentboard', count: 22 }
{ _id: 'DucHaiten/all-in', count: 8 }
{ _id: 'arbml/alpagasus_cleaned_ar_reviewed_v4', count: 8 }
{ _id: 'anandhuvasudev/guanaco-llama2-1k', count: 1 }
{ _id: '203427as321/articles', count: 9 }
{ _id: 'jacobbieker/eumetsat-iodc', count: 6 }
{ _id: 'YANG-Cheng/ab', count: 24 }
{ _id: 'jacobbieker/himawari9-kerchunk', count: 1 }
25 have jobs (it took some minutes between the two commands, so, some datasets might have disappeared from the first command). Let's wait until the number of jobs has come back to normality, it's too hard to discriminate between normal cases and problematic ones.
Today:
{ _id: 'CyberHarem/roma_kantaicollection', count: 1 }
{ _id: 'red_caps', count: 1 }
{ _id: 'CyberHarem/ak_47_girlsfrontline', count: 1 }
{ _id: 'Recag/Rp_CommonC_241', count: 1 }
{ _id: 'cutterd/gelgen_tar_29', count: 1 }
{ _id: 'x_stance', count: 1 }
{ _id: 'CyberHarem/golden_hind_azurlane', count: 1 }
{ _id: 'kenhktsui/open-toolformer-retrieval-multi-neg-result-new-kw', count: 1 }
{ _id: 'CyberHarem/miyu_edelfelt_fgo', count: 1 }
{ _id: 'Leogrin/real-toxicity-prompts_first_5K', count: 1 }
They all have been computed more than one day ago, and have not been backfilled (or deleted) since.
It's not clear why. They don't have common characteristics that could help finding a reason.
The last one ('Leogrin/real-toxicity-prompts_first_5K') does not exist anymore on the Hub, but the cache entry has not been deleted. Maybe it has been deleted after the last backfill.
For reference, the last backfill gave:
93387 analyzed datasets (total: 93387 datasets): 3 datasets have been deleted (0.00%), 0 datasets raised an exception (0.00%)
And it processed these datasets apparently without an error:
message
"INFO: 2024-01-18 23:12:29,605 - root - Analyzing cutterd/gelgen_tar_29"
"DEBUG: 2024-01-18 23:12:29,605 - urllib3.connectionpool - https://huggingface.co:443 ""GET /api/datasets/cutterd/gelgen_tar_29 HTTP/1.1"" 200 557"
"INFO: 2024-01-18 23:12:29,617 - root - Setting new revision to cutterd/gelgen_tar_29"
Let's look at the workers logs: no log for cutterd/gelgen_tar_29
and there is no job for it either. So: at some point in libcommon.orchestrator.set_revision()
, we silently exited.
Possibly, DatasetBackfillPlan
does nothing (and might even delete existing jobs) if a dataset has only one entry.
plan = DatasetBackfillPlan(
dataset=dataset,
revision=revision,
priority=priority,
processing_graph=processing_graph,
only_first_processing_steps=True,
)
Today: 0 occurrences, as expected
See causalnlp/corr2cause
The first step was successful, but no other step was computed.
reported here: https://huggingface.co/datasets/causalnlp/corr2cause/discussions/5