File "Show-o/parquet/refinedweb_dataset.py", line 20, in
from parquet.parquet_dataset import CruiseParquetDataset
ModuleNotFoundError: No module named 'parquet.parquet_dataset'
Hi, note that, we use the internal packages to process the RefinedWeb dataset, and you must manually comment the code part related to language modeling in training/train.py or write a new dataloder.
File "Show-o/parquet/refinedweb_dataset.py", line 20, in
from parquet.parquet_dataset import CruiseParquetDataset
ModuleNotFoundError: No module named 'parquet.parquet_dataset'