nvtransfer / RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
Apache License 2.0
646 stars 43 forks source link

Unable to reproduce result for Llama3.1(8B) #70

Closed muhangao closed 2 weeks ago

muhangao commented 2 weeks ago

Thank you so much for this amazing work!

I'm trying to reproduce the result of Llama3.1(8B). However, the number I get is much lower than the reported number, even in the easiest _singleneedle task.

The only possible reason is that I made trivial changes to code to test positional bias, but it should not be relevant to this discrepancy.

An example input is: Some special magic numbers are hidden within the following text. Make sure to memorize it. I will quiz you about the numbers afterwards.\\nOne of the special magic numbers for tenuous-hospice is: 2867825. I suspect a lot of people realized this, but reacted simply by not studying philosophy, rather than becoming philosophy professors.How did things get this way? Can something people have spent thousands of years studying really be a waste of time? Those are interesting questions. In fact, some of the most interesting questions you can ask about philosophy. The most valuable way to approach the current philosophical tradition may be neither to get lost in pointless speculations like Berkeley, nor to shut them down like Wittgenstein, but to study it as an example of reason gone wrong.HistoryWestern philosophy really begins with Socrates, Plato, and Aristotle. What we know of their predecessors comes from fragments and references in later works; their doctrines could be described as speculative cosmology that occasionally strays into analysis. Presumably they were driven by whatever makes people in every other society invent cosmologies. [3]With Socrates, Plato, and particularly Aristotle, this tradition turned a corner. There started to be a lot more analysis. I suspect Plato and Aristotle were encouraged in this by progress in math. Mathematicians had by then shown that you could figure things out in a much more conclusive way than by making up fine sounding stories about them. [4]People talk so much about abstractions now that we don't realize what a leap it must have been when they first started to. It was presumably many thousands of years between when people first started describing things as hot or cold and when someone asked \"what is heat?\" No doubt it was a very gradual process. We don't know if Plato or Aristotle were the first to ask any of the questions they did. But their works are the oldest we have that do this on a large scale, and there is a freshness (not to say naivete) about them that suggests some of the questions they asked were new to them, at least.Aristotle in particular reminds me of the phenomenon that happens when people discover something new, and are so excited by it that they race through a huge percentage of the newly discovered territory in one lifetime. If so, that's evidence of how new this kind of thinking was. [5]This is all to explain how Plato and Aristotle can be very impressive and yet naive and mistaken. It was impressive even to ask the questions they did. That doesn't mean they always came up with good answers. It's not considered insulting to say that ancient Greek mathematicians were naive in some respects, or at least lacked some concepts that would have made their lives easier. So I hope people will not be too offended if I propose that ancient philosophers were similarly naive. In particular, they don't seem to have fully grasped what I earlier called the central fact of philosophy: that words break if you push them too far. \"Much to the surprise of the builders of the first digital computers,\" Rod Brooks wrote, \"programs written for them usually did not work.\" [6] Something similar happened when people first started trying to talk about abstractions. Much to their surprise, they didn't arrive at answers they agreed upon. In fact, they rarely seemed to arrive at answers at all.They were in effect arguing about artifacts induced by sampling at too low a resolution.The proof of how useless some of their answers turned out to be is how little effect they have. No one after reading Aristotle's Metaphysics does anything differently as a result. [7]Surely I'm not claiming that ideas have to have practical applications to be interesting? No, they may not have to. Hardy's boast that number theory had no use whatsoever wouldn't disqualify it. But he turned out to be mistaken. In fact, it's suspiciously hard to find a field of math that truly has no practical use. And Aristotle's explanation of the ultimate goal of philosophy in Book A of the Metaphysics implies that philosophy should be useful too.Theoretical KnowledgeAristotle's goal was to find the most general of general principles. The examples he gives are convincing: an ordinary worker builds things a certain way out of habit; a master craftsman can do more because he grasps the underlying principles. The trend is clear: the more general the knowledge, the more admirable it is. But then he makes a mistake\u2014possibly the most important mistake in the history of philosophy. He has noticed that theoretical knowledge is often acquired for its own sake, out of curiosity, rather than for any practical need. So he proposes there are two kinds of theoretical knowledge: some that's useful in practical matters and some that isn't.\\nWhat are all the special magic numbers for tenuous-hospice mentioned in the provided text? The special magic numbers for tenuous-hospice mentioned in the provided text are

The template I'm using is: messages = f"<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n{text}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"

The way I extract answer is:

        messages = f"<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n{text}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"

        outputs = model(messages, max_new_tokens=50, do_sample=False)

        model_answer = outputs[0]["generated_text"][len(messages):]

I really appreciate any suggestion on the reason of this issue.

hsiehjackson commented 2 weeks ago

Based on your provided example input, I think your answer prefix The special magic numbers for tenuous-hospice mentioned in the provided text are should be put after <|start_header_id|>assistant<|end_header_id|>\n\n to prevent model refusing to answer. If you don't have this answer prefix, I think model may give some unrelated outputs.