snap-stanford / stark

STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases (https://stark.stanford.edu/)
https://stark.stanford.edu/
MIT License
270 stars 33 forks source link

Error during processing of Amazon data from scratch #3

Closed bechbd closed 1 month ago

bechbd commented 1 month ago

There is an error during reprocessing from the raw "amazon" data that looks to be because it is missing a reference to self.review_columns:

Traceback (most recent call last):
  File "/home/ec2-user/stark/main.py", line 7, in <module>
    kb = get_semistructured_data(dataset_name, download_processed=False)
  File "/home/ec2-user/stark/src/benchmarks/get_semistruct.py", line 9, in get_semistructured_data
    kb = AmazonSemiStruct(root=data_root,
  File "/home/ec2-user/stark/src/benchmarks/semistruct/amazon.py", line 113, in __init__
    processed_data = self._process_raw(categories)
  File "/home/ec2-user/stark/src/benchmarks/semistruct/amazon.py", line 344, in _process_raw
    node_info = self.construct_raw_node_info(df_meta_reduced, df_review_reduced, df_qa_reduced)
  File "/home/ec2-user/stark/src/benchmarks/semistruct/amazon.py", line 525, in construct_raw_node_info
    df_row_to_dict(df_i, colunm_names=review_columns \
NameError: name 'review_columns' is not defined