I wanted to use my own script to handle the processing, and followed the tutorial documentation by rewriting the MyDatasetConfig and MyDatasetBuilder (which contains the _info,_split_generators and _generate_examples methods) classes. Testing with simple data was able to output the results of the processing, but when I wished to do more complex processing, I found that I was unable to debug (even the simple samples were inaccessible). There are no errors reported, and I am able to print the _info,_split_generators and _generate_examples messages, but I am unable to access the breakpoints.
Steps to reproduce the bug
my_dataset.py
import json
import datasets
class MyDatasetConfig(datasets.BuilderConfig):
def init(self, kwargs):
super(MyDatasetConfig, self).init(kwargs)
class MyDataset(datasets.GeneratorBasedBuilder):
VERSION = datasets.Version("1.0.0")
Describe the bug
I wanted to use my own script to handle the processing, and followed the tutorial documentation by rewriting the MyDatasetConfig and MyDatasetBuilder (which contains the _info,_split_generators and _generate_examples methods) classes. Testing with simple data was able to output the results of the processing, but when I wished to do more complex processing, I found that I was unable to debug (even the simple samples were inaccessible). There are no errors reported, and I am able to print the _info,_split_generators and _generate_examples messages, but I am unable to access the breakpoints.
Steps to reproduce the bug
my_dataset.py
import json import datasets
class MyDatasetConfig(datasets.BuilderConfig): def init(self, kwargs): super(MyDatasetConfig, self).init(kwargs)
class MyDataset(datasets.GeneratorBasedBuilder): VERSION = datasets.Version("1.0.0")
main.py
import os os.environ["TRANSFORMERS_NO_MULTIPROCESSING"] = "1"
from datasets import load_dataset
dataset = load_dataset("my_dataset.py", split="train", cache_dir=None)
print(dataset[:5])
Expected behavior
Pause at breakpoints while running debugging
Environment info
pycharm