taligentx / LiveTVH.bundle

Live TV streaming with Plex and Tvheadend
GNU General Public License v3.0
105 stars 19 forks source link

Revert control character filtering. #24

Closed aarond10 closed 6 years ago

aarond10 commented 6 years ago

This breaks UTF-8 parsing. Japanese text shows up garbled if decoded this way, even if epgEncoding specified as utf-8.

taligentx commented 6 years ago

Hi @aarond10, thanks for the PR! Does Japanese EPG data decode if you remove the \x00-\x1f filtering?

Setting utf-8 or latin-1 is necessary in the EPG request to handle EPG data from over the air ATSC streams, so I'd like to see exactly what is happening with Japanese text.

To replicate the issue, what is your EPG source? Or feel free to attach the EPG xml file - xmltv doesn't seem to have a grabber for Japanese channels, and the eonet/skyperfect grabbers for WebGrab+Plus do not seem functional.

aarond10 commented 6 years ago

No, doesn't appear to work without this line either. This is part of a bit of an experiment of mine.

For context, I'm trying to pull EPG data off an ISDB-T feed (muxed in m2ts). Eventually I'd like to build a FUSE proxy that can take an ISDB-T device with all the ARIB standards and encryption and make it look like a vanilla DVB-T device. There is an old tvheadend fork and a gconv module that added support for the custom character encoding used by Japanese TVs (ARIB-B24). I had some bad UTF-8 originally but seem to have that licked. Sample file is here.

I spent a few hours playing with various things to see what was going on. I had invalid UTF-8 in my JSON response originally which may have caused some issues but even without this it seems like the fields are unicode strings with with the latin-1 mapped verbatim into unicode. (i.e. the raw UTF-8 is being encoded one-byte-per-char). Admittedly I haven't dug much further than that.

taligentx commented 6 years ago

Thanks for the EPG data, I'll be taking a look to see how it can be accommodated while still supporting ATSC EPG data.

taligentx commented 6 years ago

I've been testing with a few different EPG sources and haven't seen an issue unless the source contains invalid UTF-8 characters. The headers for the JSON data returned from the Tvheadend API always specify that the data is UTF-8 so ideally there shouldn't be a case where setting the encoding for JSON.ObjectFromURL() to utf-8 causes a problem.

The fallback to ISO-8859-1 is a necessary workaround because of a Tvheadend bug where the configured character set in the Tvheadend networks, muxes, and services options is being ignored for data from the ATSC EPG grabber:

https://tvheadend.org/issues/5162

If this is resolved, I'll be removing the fallback to ISO-8859-1 and only decode as utf-8, but this wouldn't resolve the problem that you're running into. Is there a way to ensure you're feeding only valid UTF-8 to Tvheadend?

aarond10 commented 6 years ago

I see. I think I've fixed the invalid data in my input (properly decoding ARIB-STD24 everywhere) but I'm traveling for the next week or so and can't confirm easily. Feel free to close this. I'll try to make sure I've got valid data and double check my side. Thanks for the investigation!

On 25 July 2018 at 09:16, Nikhil Choudhary notifications@github.com wrote:

I've been testing with a few different EPG sources and haven't seen an issue unless the source contains invalid UTF-8 characters. The headers for the JSON data returned from the Tvheadend API always specify that the data is UTF-8 so ideally there shouldn't be a case where setting the encoding for JSON.ObjectFromURL() to utf-8 causes a problem.

The fallback to ISO-8859-1 is a necessary workaround because of a Tvheadend bug where the configured character set in the Tvheadend networks, muxes, and services options is being ignored for data from the ATSC EPG grabber:

https://tvheadend.org/issues/5162

If this is resolved, I'll be removing the fallback to ISO-8859-1 and only decode as utf-8, but this wouldn't resolve the problem that you're running into. Is there a way to ensure you're feeding only valid UTF-8 to Tvheadend?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/taligentx/LiveTVH.bundle/pull/24#issuecomment-407582030, or mute the thread https://github.com/notifications/unsubscribe-auth/AAM3Xh0adXng8b7qAQx_tk76D4oO3_lFks5uJ6rogaJpZM4VSEBF .

taligentx commented 6 years ago

Sounds good, feel free to open up a new issue if there's any problem viewing the data as UTF-8.