Base vs Chat prompt question.

nvtransfer / RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?

Apache License 2.0

646 stars 43 forks source link

Base vs Chat prompt question. #31

Closed karansaxena closed 4 months ago

karansaxena commented 4 months ago

I wanted to confirm my understanding of the setup -

We have this file for the template https://github.com/hsiehjackson/RULER/blob/main/scripts/data/synthetic/constants.py

and this file to control base-vs-chat prompt https://github.com/hsiehjackson/RULER/blob/main/scripts/data/template.py

Is that the right understanding that the base vs the chat mode prompt differs only (relatively) slightly?

Also, do we do everything zero-shot (i.e. no in-context examples)?

hsiehjackson commented 4 months ago

Yes, the first file controlls the template for each task while the second file controlls the template for each model (chat or base). Base-vs-chat only differs slightly based on how the model is aligned with their corresponding chat template.

Variable tracking and common words extraction have one demonstration. You can find in the following: https://github.com/hsiehjackson/RULER/blob/main/scripts/data/synthetic/variable_tracking.py#L194-L198 https://github.com/hsiehjackson/RULER/blob/main/scripts/data/synthetic/common_words_extraction.py#L93-L96

karansaxena commented 4 months ago

Got it. Along the same lines, I wanted to ask another question and not open another issue.

Why do we have validation and test sets? In other words, I understand that we report results on (500?) test-samples. How/where is validation set used?

hsiehjackson commented 4 months ago

We report results of 500 samples, and we don't have validation sets and test sets separately. The validation you found in the repo is only used for the naming of your generated dataset.