guessit-io / guessit

GuessIt is a python library that extracts as much information as possible from a video filename.
https://guessit-io.github.io/guessit
GNU Lesser General Public License v3.0
824 stars 91 forks source link

garbled text occur when input Chinese text #779

Open kejicjk opened 5 months ago

kejicjk commented 5 months ago

garbled text occur when input Chinese text, for example input: python3 videoTextGuessit.py 庆余年第二季/01.mp4

response is {"title": "\u5e86\u4f59\u5e74\u7b2c\u4e8c\u5b63", "season": null, "episode": 1}

script detail is

import sys
import json
from guessit import guessit

def parse_video_info(file_path):
    guess = guessit(file_path)
    result = {
        'title': guess.get('title'),
        'season': guess.get('season'),
        'episode': guess.get('episode')
    }
    print(json.dumps(result))

if __name__ == "__main__":
    file_path = sys.argv[1]
    parse_video_info(file_path)
chevignon93 commented 4 months ago

@kejicjk I don't know if you've solved your problem but I'd just like to point out that this has nothing to do with this library, this is how the json module works when it encounter non-ascii characters.

To "solve" it, just ensure that you add ensure_ascii=False when using json.dumps

import sys
import json
from guessit import guessit

def parse_video_info(file_path):
    guess = guessit(file_path)
    result = {
        "title": guess.get("title"),
        "season": guess.get("season"),
        "episode": guess.get("episode"),
    }
    print(json.dumps(result, ensure_ascii=False))
### Output: {"title": "庆余年第二季", "season": null, "episode": 1}

if __name__ == "__main__":
    file_path = sys.argv[1]
    parse_video_info(file_path)