tnozaki / cuelib

Automatically exported from code.google.com/p/cuelib
1 stars 2 forks source link

The parser doesn't support charsets different than ASCII in lines #22

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Take cue sheet with Russian or other language content
2. Try to parse using the library
3.

What is the expected output?
Parsed result in national characters

What do you see instead?
Line unparseable warnings

What version of the product are you using?
1.2.1
On what operating system?
Arch Linux

Please provide any additional information below.
I have fix for the problem by subclassing line reader and provide charset 
decoding in lines, modified code is available in music-barrel project.

Original issue reported on code.google.com by metricst...@gmail.com on 9 Mar 2013 at 6:55

Attachments:

tnozaki commented 8 years ago

don't assume that UTF-8 BOM always there, auto detect encoding is always bad idea. it is not perfect solution, cause there is lots of character encoding scheme in the world.

it is design mistaken of CueParser class, better solution is:

  1. it should be initanciate and hold character encoding scheme property.
  2. or parse() accept second argument, to specify character encoding scheme.

workaround for this problem, use following code:

CueSheet cueSheet = CueParser.parse(
  new LineNumberReader(
    new InputStreamReader(
      new FileInputStream(new File(".cue")), "UTF-8")));