Open davidlee1102 opened 1 year ago
Strange error. Which version of datasets
are you using? I tried it with a recent version and it works.
Maybe you do not have enough space in the home directory? Try forcing the hugging face cache somewhere else with:
import os
os.environ["HF_HOME"] = "/path/to/your/cache"
I use the same dataset you have used, I have checked, and I think the error comes from the env on Google Colab and Kaggle, so would you mind trying it on Google Colab or Kaggle ?
Please check again on your training code sample.
from datasets import load_dataset
data = load_dataset("json",data_files="/kaggle/working/medAlpaca/medical_meadow_small.json")
---ERROR--- File /opt/conda/lib/python3.10/site-packages/datasets/packaged_modules/json/json.py:150, in Json._generate_tables(self, files) 145 except json.JSONDecodeError: 146 raise e 147 raise ValueError( 148 f"Not able to read records in the JSON file at {file}. " 149 f"You should probably indicate the field of the JSON file containing your records. " --> 150 f"This JSON file contain the following fields: {str(list(dataset.keys()))}. " 151 f"Select the correct one and provide it as
field='XXX'
to the dataset loading method. " 152 ) from None 153 # Uncomment for debugging (will print the Arrow table size and elements) 154 # logger.warning(f"pa_table: {pa_table} num rows: {pa_table.num_rows}") 155 # logger.warning('\n'.join(str(pa_table.slice(i, 1).to_pydict()) for i in range(pa_table.num_rows))) 156 yield (file_idx, batch_idx), self._cast_classlabels(pa_table)AttributeError: 'list' object has no attribute 'keys'