In the DataComp paper (original one for VLM's), some of the heuristics were based on features from the datasets that were used for evaluations. Is this permitted in the filtering track for DCLM? For example, are we allowed to use featurized MMLU prompts in our selection algorithm?
In the DataComp paper (original one for VLM's), some of the heuristics were based on features from the datasets that were used for evaluations. Is this permitted in the filtering track for DCLM? For example, are we allowed to use featurized MMLU prompts in our selection algorithm?