mira-space / MiraData

Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"
https://mira-space.github.io/
GNU General Public License v3.0
349 stars 9 forks source link

BUG set and solution #1

Closed rese1f closed 5 months ago

rese1f commented 5 months ago

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff:

with open(args.meta_csv, "r", encoding='ISO-8859-1') as f: df = pd.read_csv(f)

Error: Unable to extract uploader id https://stackoverflow.com/questions/75495800/error-unable-to-extract-uploader-id-youtube-discord-py

Gymat commented 5 months ago

Thanks for your suggestion! We will fix it by using the following code to handle all encoding formats:

encodings = ['utf-8', 'ISO-8859-1', 'cp1252']
# Try using different encoding formats
for encoding in encodings:
    try:
        data = pd.read_csv(csv_file, encoding=encoding)
        break
    except UnicodeDecodeError:
        print(f"Error: {encoding} decoding failed, trying the next encoding format")
RuoyuFeng commented 5 months ago

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff:

with open(args.meta_csv, "r", encoding='ISO-8859-1') as f: df = pd.read_csv(f)

Error: Unable to extract uploader id https://stackoverflow.com/questions/75495800/error-unable-to-extract-uploader-id-youtube-discord-py

Thanks for the share. I tried the solution in that link.

In my case, do not use pip install youtube_dl, use python3 -m pip install --force-reinstall https://github.com/yt-dlp/yt-dlp/archive/master.tar.gz if already install youtube_dl, uninstall it.

and modify the code of Line4 in download_data.py from import youtube_dl to import yt_dlp as youtube_dl