princeton-nlp / LESS

[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
MIT License
376 stars 36 forks source link

`open_instruct` used in `evaluation/eval/utils.py` is out-of-date #30

Closed cafeii closed 1 month ago

cafeii commented 1 month ago

The latest version of open_instruct does not contain function encode_with_prompt_completion_format, which is used in utils.py.

I found the code in: https://github.com/allenai/open-instruct/blob/58f92c8739d4c4219f8b192e04e595b9d97cd90c/open_instruct/finetune.py might be the right version used in LESS, not quite sure

t07902301 commented 1 month ago

If you go to less/data_selection/get_training_dataset.py, in line 98, you will find the reference of this functionto open_instruct.

    '''
    Original implementation of the function: https://github.com/allenai/open-instruct/blob/9ebcb582cfc243a6dab75b4302fa432784db26c2/open_instruct/finetune.py#L238

    Here we assume each example has 'prompt' and 'completion' fields.
    We concatenate prompt and completion and tokenize them together because otherwise prompt will be padded/trancated 
    and it doesn't make sense to follow directly with the completion.
    '''
cafeii commented 1 month ago

Thanks, I will close the issue