paperswithcode / galai

Model API for GALACTICA
Apache License 2.0
2.67k stars 275 forks source link

Movie Trivia Data Sources? #66

Closed mkemka closed 1 year ago

mkemka commented 1 year ago

First of all thank you so much for making this available. I don't want to take anything away from this extraordinary technical work but I am curious about the source data. Is there any way I can easily track exactly where in the model this type of answer is getting generated from and ideally - which scientific paper(s) have these references?

prompt = "Question: What is the best job for a man?"

Context: The film begins with a man (played by Tom Cruise) who is working as a waiter in a restaurant. 
He is a good-natured, but somewhat naive, man who is trying to make ends meet. 
He is also a good friend of the owner of the restaurant, played by Robert De Niro. 
The owner is a very wealthy man who is trying to make a fortune. 
He is also a very good friend of the owner's wife, played by Kate Winslet. 
The owner's wife is a very beautiful woman who is trying to make a fortune. 
She is also a very good friend of the owner's son, played by Jack Black. 
The owner's son is a very good-looking, but somewhat naive, man who is trying to make ends meet. 
He is also a very good friend of the owner's daughter, played by Emma Thompson. 
The owner's daughter is a very beautiful woman who is trying to make a fortune. 
She is also a very good friend of the owner's son.

Answer: waiter

edit: added new line characters to split the response

mkardas commented 1 year ago

Hi Martin, please have a look at the Galactica paper, especially appendix sections A.1 and A.6. Model outputs can combine multiple data sources in a non-trivial way so it's not always possible to track them back. The NatureBook corpus includes English wikipedia, which contains lots of movie plots and biographies of actors, which might partially explain the above non-scientific generation.