luisroque / large_laguage_models

180 stars 51 forks source link

llama dataset #6

Closed andysingal closed 1 year ago

andysingal commented 1 year ago

Greetings Luis, While trying to replicate dataset similar to your i saw some differences during creation. Here is my colab: https://colab.research.google.com/drive/1OqQIdNnInl6p6WTgIOK4qZ2v1BUt2Ywd?usp=sharing HF Dataset: https://huggingface.co/datasets/Andyrasika/instruct-python-llama2-20k/viewer/default/train?row=1

i see your columns does not have text within the rows(luisroque/instruct-python-llama2-20k) whereas the dataset i created shows. Morever, i am also getting: index_level_0 i made the following changes:

return {
            f"<s>[INST] <</SYS>>\n{Config.SYSTEM_MESSAGE.strip()}\n<</SYS>>\n\n"
            f"{user_text} [/INST] {assistant_text} </s>"
        }

Do you think it is possible for you to guide on what seems to be wrong in the dataset i created. Thanks, Andy

luisroque commented 1 year ago

Hey Andy, did you make that change in the transform_dataset_format for some reason? Can you run the complete code without modifications? It should work and create exactly the dataset that I have in my HF. Let me know if it does not work.

andysingal commented 1 year ago

Hey Andy, did you make that change in the transform_dataset_format for some reason? Can you run the complete code without modifications? It should work and create exactly the dataset that I have in my HF. Let me know if it does not work.

I had to remove text because it was appearing in the dataset . ( note: text: ) .