tloen / alpaca-lora

Instruct-tune LLaMA on consumer hardware
Apache License 2.0
18.68k stars 2.22k forks source link

[Question] about pipeline for fine-tuning with conversational question answering dataset. #577

Open phamkhactu opened 1 year ago

phamkhactu commented 1 year ago

I want to fine-tune model with coQA( conversational question answering). I have seen your dataset schema:

Context:

Question 1,
Question 2,
................

Answer 1,
Answer 2,
Context: 'The Vatican Apostolic Library (), more commonly called the Vatican Library or simply the Vat, is the library of the Holy See, located in Vatican City. Formally established in 1475, although it is much older, it is one of the oldest libraries in the world and contains one of the most significant collections of historical texts. It has 75,000 codices from throughout history, as well as 1.1 million printed books, which include some 8,500 incunabula. \n\nThe Vatican Library is a research library for history, law, philosophy, science and theology. The Vatican Library is open to anyone who can document their qualifications and research needs. Photocopies for private study of pages from books published between 1801 and 1990 can be requested in person or by mail. \n\nIn March 2014, the Vatican Library began an initial four-year project of digitising its collection of manuscripts, to be made available online. \n\nThe Vatican Secret Archives were separated from the library at the beginning of the 17th century; they contain another 150,000 items. \n\nScholars have traditionally divided the history of the library into five periods, Pre-Lateran, Lateran, Avignon, Pre-Vatican and Vatican. \n\nThe Pre-Lateran period, comprising the initial days of the library, dated from the earliest days of the Church. Only a handful of volumes survive from this period, though some are very significant.'

Questions: 'When was the Vat formally opened?what is the library for?for what subjects?and?what was started in 2014?how do scholars divide the library?how many?what is the official name of the Vat?where is it?how many printed books does it contain?when were the Secret Archives moved from the rest of the library?how many items are in this secret collection?Can anyone use this library?what must be requested to view?what must be requested in person or by mail?of what books?What is the Vat the library of?How many books survived the Pre Lateran period?what is the point of the project started in 2014?what will this allow?'

Answers: bla bla

But in some question, it combine from conversation, if I create instruction: instruction : , input: , output:. I think model is not good for converge. For example:

instruction: "and?"
input: context above
instruction: "how many?"
input: context above

My question is: how to create a instruction for conversational question answering?

Thank you