hockyy / miteiru

Miteiru is an open source Electron video player to learn Chinese, Cantonese, and Japanese. It can play all Youtube and HTML 5 supported format (.mkv, .mp4, .mov, and many more) videos, and lots of supports on other subtitle formats (.srt, .ass, .vtt, and many more)
Other
55 stars 0 forks source link
anime cantonese chinese electron hanzi hiragana japanese jieba jmdict jyutping kanji katakana kuromoji mecab player subtitle video video-player

Miteiru (見ている) / KànZhe (看着) / tai²gan² (睇緊)

Stargazers repo roster for @hocky/miteiru

License: CC BY-NC-SA 4.0 GitHub release Open Issues Contributors Last Commit GitHub Stars GitHub Forks

Download ૮ ˶ᵔ ᵕ ᵔ˶ ა✩°。 ⋆⸜

kiwi banner pomegranate
Miteiru Logo Miteiru is an open source Electron video player to learn Chinese, Japanese, and Cantonese. It has modular main language dictionary and tokenizer (morphological analyzer), heavily based on External software MeCab, and optinally needs JMDict to give language info box. This software is heavily inspired by Anisubber.

What can 見ている do?

How to start immersing

For Casual Users: Installation Guide

Mac

Windows

Ubuntu

How to integrate with Whisper

As per February 3rd, 2024 MacWhisper is a really good UI for Whisper in mac, anyway if you want to run whisper on other OS or for free:

put this in your ~/.bashrc or ~/.zshrc or any rc your os use

export WHISPERPATH=~/project/whisper.cpp

whisper() {
  local input="$1"
  shift

  # All remaining arguments will be treated as an array
  local -a extra_args=("$@")
  "$WHISPERPATH/main" -f "$input" -of "$input.w" --model "$WHISPERPATH/models/ggml-medium.bin" -l ja "${extra_args[@]}" -osrt
}

prepwhisper() {
  local input="$1"
  local output="${input%.\*}.wav"
  ffmpeg -i "$input" -ar 16000 -ac 1 -c:a pcm_s16le "$output"
}

Then run on your video

prepwhisper video.mp4
whisper video.wav

For Developer: (Own Build) Installation Guide

You can run the followings on the cloned repository: (don't forget to download the LFS files as well)

npm install
npm run script:initrepo
npm run dev # This to run dev
npm run build:nsis # This to build for Windows
npm run build:portable # This to build for Windows Portable
npm run build:linux20 # This to build for Linux 20.04
npm run build:linux22 # This to build for Linux 22.04
npm run build # this is for mac

Mecab and Custom Dictionary Setup (Optional)

Mecab can be downloaded through brew by running:

brew install mecab

or in Ubuntu:

sudo apt install mecab

Then, you can run

which mecab

or in Windows, you can directly download the binary file from SourceForge

to show your default mecab binary file. Use it as the path when asked in Miteiru. Then, you can get JMDict Dictionary in https://github.com/scriptin/jmdict-simplified/releases. Use it as the path when asked in Miteiru as well. Miteiru will build a LevelDB cache locally. Then, you can enjoy the app!

MeCab Dictionary Customization

By default, you are using whatever your default Mecab Dictionary offers you, but you can further customize this by modifying the mecabrc file which is located in /opt/homebrew/etc/mecabrc in MacOS, C:\Program Files (x86)\MeCab\etc\mecabrc in Windows, and /etc/mecabrc in Ubuntu. For other OS's you gotta figure it our for yourself right now. Shunou, Miteiru's microlibrary can support Unidic, Jumandic, Ipadic, and it's variations. Specifically, if you check out the dicrc file of each dictionary, Shunou can support the output format chamame, chasen, and the classic Jumandic god knows what output format. You can get UniDic files here

Configuration file in mac:

;
; Configuration file of MeCab
;
; $Id: mecabrc.in,v 1.3 2006/05/29 15:36:08 taku-ku Exp $;
;
; dicdir =  /opt/homebrew/lib/mecab/dic/ipadic
; dicdir =  /opt/homebrew/lib/mecab/dic/jumandic
dicdir =  /opt/homebrew/lib/mecab/dic/unidic
; userdic = /home/foo/bar/user.dic

; output-format-type = wakati
; input-buffer-size = 8192

; node-format = %m\n
; bos-format = %S\n
; eos-format = EOS\n

Windows:

;
; Configuration file of MeCab
;
; $Id: mecabrc.in,v 1.3 2006/05/29 15:36:08 taku-ku Exp $;
;
dicdir =  $(rcpath)\..\dic\unidic

; userdic = /home/foo/bar/user.dic

; output-format-type = wakati
; input-buffer-size = 8192

; node-format = %m\n
; bos-format = %S\n
; eos-format = EOS\n

Future Enhancements

https://user-images.githubusercontent.com/19528709/236619520-076c863a-6c14-4f6e-8f9b-5d1e660fd646.mp4