xiangking / ark-nlp

A private nlp coding package, which quickly implements the SOTA solutions.
Apache License 2.0
310 stars 65 forks source link

Fix: ValueError: The data format does not exist #28

Closed xiangking closed 2 years ago

xiangking commented 2 years ago

Environment info

Python 3.8.10
ark-nlp 0.0.6


Information

读取本地文件失败

from ark_nlp.dataset import SentenceClassificationDataset

train_dataset = SentenceClassificationDataset('../data/task_datasets/cMedTC/train_data.csv')

报错信息

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-52c3ab3bcf08> in <module>
----> 1 train_dataset = SentenceClassificationDataset('../data/task_datasets/cMedTC/train_data.csv')
      2 # dev_dataset = SentenceClassificationDataset('../data/source_datasets/cMedTC/dev_data.csv')

~/anaconda3/lib/python3.7/site-packages/ark_nlp/dataset/base/_dataset.py in __init__(self, data, categories, is_retain_df, is_retain_dataset, is_train, is_test)

~/anaconda3/lib/python3.7/site-packages/ark_nlp/dataset/base/_dataset.py in _load_dataset(self, data_path)

~/anaconda3/lib/python3.7/site-packages/ark_nlp/dataset/base/_dataset.py in _read_data(self, data_path, data_format, skiprows)

ValueError: The data format does not exist
xiangking commented 2 years ago

问题出现在ark_nlp.dataset.base._dataset内的_read_data方法中

if data_format is not None:
    data_format = data_path.split('.')[-1]

应改为

if data_format is None:
    data_format = data_path.split('.')[-1]