Closed chanind closed 1 month ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 63.97%. Comparing base (
ff335f0
) to head (a602f56
). Report is 2 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
merging as this should be uncontroversial and CI is currently failing due to space issues.
Description
CI has started failing since merge #320 due to running out of space. It looks like this is due to loading and processing large datasets (c4-tokenized-2b). This PR replaces that dataset with a tiny tokenized version of c4-10k: https://huggingface.co/datasets/chanind/c4-10k-mini-tokenized-16-ctx-gelu-1l-tests. This is a tokenized version of the first 1k rows of the c4-10k dataset. It's split into 64 pieces, and the total dataset size is onlly 250kb (vs 2gb for c4-tokenized-2b)