showlab / Show-o

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
https://arxiv.org/abs/2408.12528
Apache License 2.0
806 stars 36 forks source link

No module named 'parquet.parquet_dataset' #21

Open mrswang1 opened 2 weeks ago

mrswang1 commented 2 weeks ago

File "Show-o/parquet/refinedweb_dataset.py", line 20, in from parquet.parquet_dataset import CruiseParquetDataset ModuleNotFoundError: No module named 'parquet.parquet_dataset'

Sierkinhane commented 2 weeks ago

Hi, note that, we use the internal packages to process the RefinedWeb dataset, and you must manually comment the code part related to language modeling in training/train.py or write a new dataloder.

mrswang1 commented 2 weeks ago

Thanks!