jina-ai / jerboa

LLM finetuning
Apache License 2.0
41 stars 4 forks source link

feat: select top k largest samples for `n_samples` #83

Closed sebastian-weisshaar closed 1 year ago

sebastian-weisshaar commented 1 year ago

We use n_samples often to debug for memory size on the GPU. However, peak memory is determined by the biggest samples in our data. Before we have seen all of them takes 1 whole epoch. To shorten this process, n_samples now returns the n largest data points.