snap-stanford / stark

STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases (https://stark.stanford.edu/)
https://stark.stanford.edu/
MIT License
270 stars 33 forks source link

What do the ids mean after each query? #2

Closed HuangLED closed 6 days ago

HuangLED commented 2 months ago

Initially I thought it is "qualified product ids", then I realize the query id is also in the list.

e.g. 334460,What are some Tercel women's cycling gloves made in China that you would recommend?,"[334457, 334458, 334460, 334461]"

Searched file folder but couldn't find an interpretation to the data format.

Wuyxin commented 1 month ago

Hi,

334460 is the query id. [334457, 334458, 334460, 334461] should be the answer ids.

On STaRK-Amazon, the query id is named by the id of the product we used to generate the query, therefore it will be included in the final answer id list. This also means that the query id won't go from 0 to num_queries but instead will be an integrate from 0 to num_products. You won't have this issue on the other two datasets because we rearanged the query id.

We are working on a detailed doc which will be available soon.