openlm-research / open_llama

OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset
Apache License 2.0
7.29k stars 372 forks source link

issue with backticks ? #14

Closed vince62s closed 1 year ago

vince62s commented 1 year ago

Hi, I finetuned the 300bt checkpoint with alpaca+sharegpt but while with llama code generation works fine with openllama the backticks are missing around code blocks.

did you eliminate those in the preprocessing ?

vince62s commented 1 year ago

I also found that the model does not generate new lines ('\n') Find below the same examples as in here: https://github.com/tloen/alpaca-lora

Tell me about alpacas.

Alpacas are small, gentle, and hardy animals native to South America. They are members of the camelid family, which also includes llamas, guanacos, and vicuñas. Alpacas are known for their unique wool, which is soft and warm, and is often used in knitting and other textile arts. Alpacas are also known for their gentle nature and affectionate disposition. They are social animals that live in herds, and are often seen interacting with one another. Alpacas are also known for their ability to adapt to a variety of environments, including mountainous terrain, deserts, and even urban areas. They are hardy animals that can withstand a wide range of temperatures and climates. Alpacas are also known for their gentle disposition and gentle nature. They are often used as pets, and are often seen in small communities around the world. Alpacas are also known for their ability to thrive in captivity, and can be found in zoos and other animal sanctuaries around the world. Overall, alpacas are a fascinating and unique species of animal that is well worth learning more about. They are gentle, hardy, and adaptable, making them a great addition to any home or community.

Tell me about the president of Mexico in 2019.

The president of Mexico in 2019 was Andrés Manuel López Obrador, also known as AMLO. He was elected in 2018 and took office in December of that year. AMLO is a populist politician who campaigned on a platform of fighting corruption, improving the economy, and reducing inequality. He is also known for his left-wing policies, including a plan to legalize marijuana and a proposal to build a new airport in Mexico City. AMLO has faced criticism for his handling of the COVID-19 pandemic, which has seen Mexico become one of the most affected countries in the world. He has also faced criticism for his handling of the economy, which has seen the country's GDP growth slow down. Overall, AMLO has been a polarizing figure in Mexico, with some seeing him as a visionary leader and others seeing him as a populist demagogue.

Tell me about the king of France in 2019.

In 2019, the king of France was Emmanuel Macron. He was the president of France from 2017 to 2022. Emmanuel Macron was born in Amiens, France, on December 5, 1977. He was the son of a teacher and a lawyer. He attended the École Nationale d'Administration, a French elite school for civil servants, and graduated in 1999. After graduating, he worked as an investment banker in London and New York. In 2004, he returned to France and became an adviser to the Minister of Economy, Finance, and Industry. In 2007, he was elected to the National Assembly of France as a member of the Socialist Party. In 2012, he ran for president of France and won the election. He was reelected in 2017. During his presidency, Emmanuel Macron focused on economic reforms, including reducing the government's role in the economy, deregulating the labor market, and reducing the number of people on government assistance. He also worked to strengthen the European Union and promote French interests abroad. He was criticized for his handling of the COVID-19 pandemic in France, but he was generally well-liked by the French public. Emmanuel Macron resigned as president of France on April 15, 2022, after losing the presidential election to Marine Le Pen of the far-right National Rally. He was succeeded by his prime minister, Jean Castex.

List all Canadian provinces in alphabetical order.

The Canadian provinces are listed in alphabetical order as follows: Alberta British Columbia Manitoba New Brunswick Newfoundland and Labrador Northwest Territories Nova Scotia Nunavut Ontario Prince Edward Island Quebec Saskatchewan Yukon Territory

Write a Python program that prints the first 10 Fibonacci numbers.

Here is a Python program that prints the first 10 Fibonacci numbers: from math import ceil def fib(n): if n <= 1: return 1 return fib(n - 1) + fib(n - 2) for n in range(1, 11): print(fib(n))

Write a program that prints the numbers from 1 to 100. But for multiples of three print 'Fizz' instead of the number and for the multiples of five print 'Buzz'. For numbers which are multiples of both three and five print 'FizzBuzz'.

Here is a program that prints the numbers from 1 to 100, but for multiples of three print 'Fizz' instead of the number and for multiples of five print 'Buzz'. For numbers which are multiples of both three and five print 'FizzBuzz'. #include <stdio.h> int main() { int num; int i; int j; int k; int l; int m; int n; int o; int p; int q; int r; int s; int t; int u; int v; int w; int x; int y; int z; int a; int b; int c; int d; int e; int f; int g; int h; int i; int j; int k; int l; int m; int n; int o; int p; int q; int r; int s; int t; int u; int v; int w; int x; int y; int z; int a; int b; int c; int d; int e; int f; int g; int h; int i; int j; int k; int l; int m; int n; int o; int p; int q; int r; int s; int t; int u; int v; int w; int x; int y; int z; int a; int b; int c; int d; int e; int f; int g; int h; int i; int j; int k; int l; int m; int n; int o; int p; int q; int r; int s; int t; int u; int v; int w; int x; int y; int z; int a; int b; int c; int d; int e; int f; int g; int h; int i; int j; int k; int l; int m; int n; int o; int p; int q; int r; int s; int t; int u; int v; int w; int x; int y; int z; int a; int b; int c; int d; int e; int f; int g; int h; int i; int j; int k; int l; int m; int n; int o; int p; int q; int r; int s; int t; int u; int v; int w; int

Tell me five words that rhyme with 'shock'.

Shock, surprise, fright, frighten, frightful.
psinger commented 1 year ago

@vince62s yes I also just noticed this

It does not tokenize any new lines - I cannot even properly get it to work with adding them manually post-hoc.

Something in general is very strange with the HF tokenizer.

young-geng commented 1 year ago

Thanks for point this out. We have released a new checkpoint to fix this issue.