thomasgermain / m3u-parser

9 stars 2 forks source link

m3u file charset #1

Open frojasg1 opened 2 years ago

frojasg1 commented 2 years ago

Hello Thomas,

Nice job! To see your project has been useful to me, as I want to program a m3u reader/writer which is a subset of your library (only to deal with music files). Now I want to contribute to your project with an idea: As I have been able to check, the Charset encoding of m3u files is not unique (I have some examples of ANSI, and UTF-8 without BOM, and I am sure that any other combination is allowed). I have a kind of experience in such tasks, and I am used to using universalchardet library, which I think has a double option license, one of them very permisive, so using it would not be a big handicap, in case somebody wanted to use it without sharing their code.

It is only an idea, but may be a not so bad idea :-D

Thank you for your code!! it looks like very well.

Fran

thomasgermain commented 2 years ago

Hello Fran,

Thanks :), I'm glad it interests you.

By encoding, you mean the encoding of the m3u file itself or the encoding of a stream ?

Thomas

frojasg1 commented 2 years ago

Hi Thomas,

I have enjoyed looking at your code: I think it is very professional. I did not dare to program a m3u decoder before checking your code, because I did not know what the hell was the number after: EXTINF:266,

But after looking at your code, I have seen that is the duration of the medium ... nice! :-D

I answer your question:

I meant Character encoding (Charset), of a m3u file. I think you use a Stream instead of a file name, which is more general but it make things be a little more complicated. What I usually do in such cases, is to read the complete stream in a byte array, and then that array can produce a ByteArrayInputStream again and again all times you need. You can create first a ByteArrayInputStream to guess the Charset (using that library: universalchardet. I think I used 1.0.3 version, but may be there is aonther more updated ...). Then, when you know the Charset name, you can create a second ByteArrayInputStream to parse the m3u itself, but in this case with the right Charset ...

May be Charset is not very important when you are dealing with English language media, as I think that almost all Charsets match in the English characters subset codes, but I am Spanish, so accents, "ñ" letter, and dieresis would probably appear with a strange character if Charset is not taken into account ...

And that is all

Sometimes I miss contact with developers, so nice to have this conversation :-)

Right now I am programming a music player, so m3u will be a good way to import and export play lists. I am enjoying the path of producing this application, as instance, I have programmed a graphical equalizer that looks like quite well, and that produces some kind of satisfaction to me ...

What about you? I see you are involved in some other projects, so I can guess you enjoy programming ... Are you a Java expert? Or you use mainly another technology for working? Are you currently involved in something interesting?

I have just realized that it is not a private conversation!! What a shame! :-D

Fran.

frojasg1 commented 2 years ago

I think you can find some code samples of usage of universalchardet library on the internet.

But I can also offer you a wrapper programmed by me (it is not as good quality as your code, as I programmed it some years ago, but you can edit it in any way you want, to have a good wrapper)

If you are interested in that optión, you can write to me to: frojasg1@hotmail.com

Fran.

frojasg1 commented 2 years ago

The thing is that I usually use non ascii characters in folder paths . I think that if you do not take into account the Charset, such files will not be able to be opened.