Closed paoloschi closed 2 years ago
Hi there - thank you for the recommendation! I looks like it would not be too difficult to add a m3u/m3u8 parser to jc
. I'll take a look at this for the next release.
Hi - I have a working parser that you can install as a plugin to test. You can copy this file to your plugin folder and jc
should recognize it as a new parser.
Let me know if you run into any issues!
Thanks
Thank you for so quickly accommodating my request!
I have indeed run into a issue in testing the parser:
my jc
is installed via the official package available for my O.S., which is Void Linux
$ jc -v
jc version: 1.20.2
python interpreter version: 3.10.5
python path: /usr/bin/python3
https://github.com/kellyjonbrazil/jc
© 2019-2022 Kelly Brazil
the path where I copied the file m3u.py
is /lib/python3.10/site-packages/jc/parsers/m3u.py
Should I now already detect the presence of the parse through command jc -h
?
anyway, trying to process a m3u file I get this error:
$ cat playlist.m3u | jc --m3u
jc: Error - Missing or incorrect arguments. Use "jc -h" for help.
I tried deleting the /lib/python3.10/site-packages/jc/__pycache__
directory, which I then rebuilt by re-running the jc
package configuration via the package manager but it didn't help
What did I go wrong in?
OK, I realized my error, caused by the fact that as a web browser I use PaleMoon which is not able to interpret the anchor in the link https://github.com/kellyjonbrazil/jc#custom-parsers and therefore did not let me read the exact point you linked me:
Custom local parser plugins may be placed in a jc/jcparsers folder in your local "App data directory":
Linux/unix: $HOME/.local/share/jc/jcparsers
putting finally the parser in the right place, I can finally confirm you that it works perfectly here too!!! sorry for my noisemaking!
mumble... I have now tested a 'dirty' playlist as this may be: https://hasbahca.net/hasbahca_m3u/hasbahca_iptv.m3u (quiet, nothing illegal! is a list of only Free-to-air (FTA) worldwide TV channels)
$ cat hasbahca_iptv.m3u | jc --m3u
jc: Error - m3u parser could not parse the input data.
If this is the correct parser, try setting the locale to C (LANG=C).
For details use the -d or -dd option. Use "jc -h --m3u" for help.
debug:
$ cat hasbahca_iptv.m3u | LANG=C jc --m3u -dd
IndexError
Python 3.10.5: /usr/bin/python3
Sat Jul 16 11:54:03 2022
A problem occurred in a Python script. Here is the sequence of
function calls leading up to the error, in the order they occurred.
/bin/jc in <module>()
23 if entry_point.group == group and entry_point.name == name
24 )
25 return next(matches).load()
26
27
28 globals().setdefault('load_entry_point', importlib_load_entry_point)
29
30
31 if __name__ == '__main__':
32 sys.argv[0] = re.sub(r'(-script\.pyw?|\.exe)?$', '', sys.argv[0])
33 sys.exit(load_entry_point('jc==1.20.2', 'console_scripts', 'jc')())
sys = <module 'sys' (built-in)>
sys.exit = <built-in function exit>
load_entry_point = <function importlib_load_entry_point>
/usr/lib/python3.10/site-packages/jc/cli.py in main()
614 if isinstance(data, bytes):
615 data = data.decode('utf-8')
616 except UnicodeDecodeError:
617 pass
618
619 result = parser.parse(data,
620 raw=raw,
621 quiet=quiet)
622
623 safe_print_out(result,
624 pretty=pretty,
result undefined
parser = <module 'jcparsers.m3u' from '/home/user/.local/share/jc/jcparsers/m3u.py'>
parser.parse = <function parse>
data = '#EXTM3U\r\n#EXTINF:-1 group-title="*TÜRKi_TÜRKMENI...=MIDROLL&ads._fw_app_store_url=%7BAPP_DOMAIN%7D\r\n'
raw = False
quiet = False
/home/user/.local/share/jc/jcparsers/m3u.py in parse(data='#EXTM3U\r\n#EXTINF:-1 group-title="*TÜRKi_TÜRKMENI...=MIDROLL&ads._fw_app_store_url=%7BAPP_DOMAIN%7D\r\n', raw=False, quiet=False)
122
123 # standard extended info fields
124 if line.lstrip().startswith('#EXTINF:'):
125 output_line = {
126 'runtime': line.split(':')[1].split(',')[0].strip(),
127 'display': line.split(':')[1].split(',')[1].strip()
128 }
129 continue
130
131 # ignore all other extension info (obsolete)
132 if line.lstrip().startswith('#'):
line = '#EXTINF:-1 group-title="*TÜRKi_TÜRKMENISTAN" tvg.../TVLogo/world/turkmax_gurme_tr.png",Turkmen Спорт'
line.split = <built-in method split of str object>
].split undefined
IndexError: list index out of range
__cause__ = None
__class__ = <class 'IndexError'>
__context__ = None
__delattr__ = <method-wrapper '__delattr__' of IndexError object>
__dict__ = {}
__dir__ = <built-in method __dir__ of IndexError object>
__doc__ = 'Sequence index out of range.'
__eq__ = <method-wrapper '__eq__' of IndexError object>
__format__ = <built-in method __format__ of IndexError object>
__ge__ = <method-wrapper '__ge__' of IndexError object>
__getattribute__ = <method-wrapper '__getattribute__' of IndexError object>
__gt__ = <method-wrapper '__gt__' of IndexError object>
__hash__ = <method-wrapper '__hash__' of IndexError object>
__init__ = <method-wrapper '__init__' of IndexError object>
__init_subclass__ = <built-in method __init_subclass__ of type object>
__le__ = <method-wrapper '__le__' of IndexError object>
__lt__ = <method-wrapper '__lt__' of IndexError object>
__ne__ = <method-wrapper '__ne__' of IndexError object>
__new__ = <built-in method __new__ of type object>
__reduce__ = <built-in method __reduce__ of IndexError object>
__reduce_ex__ = <built-in method __reduce_ex__ of IndexError object>
__repr__ = <method-wrapper '__repr__' of IndexError object>
__setattr__ = <method-wrapper '__setattr__' of IndexError object>
__setstate__ = <built-in method __setstate__ of IndexError object>
__sizeof__ = <built-in method __sizeof__ of IndexError object>
__str__ = <method-wrapper '__str__' of IndexError object>
__subclasshook__ = <built-in method __subclasshook__ of type object>
__suppress_context__ = False
__traceback__ = <traceback object>
args = ('list index out of range',)
with_traceback = <built-in method with_traceback of IndexError object>
The above is a description of an error in a Python program. Here is
the original traceback:
Traceback (most recent call last):
File "/bin/jc", line 33, in <module>
sys.exit(load_entry_point('jc==1.20.2', 'console_scripts', 'jc')())
File "/usr/lib/python3.10/site-packages/jc/cli.py", line 619, in main
result = parser.parse(data,
File "/home/user/.local/share/jc/jcparsers/m3u.py", line 127, in parse
'display': line.split(':')[1].split(',')[1].strip()
IndexError: list index out of range
I read this note:
$ cat hasbahca_iptv.m3u | LANG=C jc -h --m3u
jc - JSON Convert M3U and M3U8 file parser
Only standard extended info fields are supported.
means that this playlist does not contain standard extended info fields and it will never be processable with jc?
Thanks for testing! I should be able to fix the parser so it works with dirty files. I’ll look into the issue.
I made some updates to the code to allow these types of fields. There are still some unparsable lines, but these are also handled gracefully now. Let me know if that works for you. Thanks again for testing!
p.s.: I'm working on allowing the parser to get some of those corner-cases with single quotes, too. Might be able to get that working over the weekend if I have some time.
In fact, I too had identified problems where an apostrophe is present, as in this case....
$ cat hasbahca_iptv.m3u | head -n 2500 | jc --m3u -
jc: Warning - Not able to parse non-standard extensions in the following line:
#EXTINF:-1 group-title="World_News+Busines" tvg-logo="http://hasbahca.net/TVLogo/world/zone_reality_europe.png",Real America's Voice
Ok, I think I fixed it now. Not getting any more errors when I test. Let me know if you find any others that cause problems. Thanks!
I have spent the last hour testing the parser with a large number of m3u/m3u8 files from different sources and have not run into any issues at all :-)
Kudos to you, you did an admirable job.
As far as I could discern, the parser definitely looks ready to be released to the public.
Thank you again for so readily accommodating my request
Nice! Third time was the charm. I'll go ahead and release this parser in the next jc
release. Probably in a couple weeks or so.
ouch! same playlist: today's update has introduced strings with unpaired number of double quotes :-( Is it worth remedying this as well?
$ head -n1000 hasbahca.m3u8 | jc --m3u >/dev/null
jc: Warning - Not able to parse non-standard extensions in the following line:
#EXTINF:-1 group-title="RUS_EXCCCP 1",109CH "BRIDGE TV
jc: Warning - Not able to parse non-standard extensions in the following line:
#EXTINF:-1 group-title="RUS_EXCCCP 1",114CH "FASHION box
jc: Warning - Not able to parse non-standard extensions in the following line:
#EXTINF:-1 group-title="RUS_EXCCCP 1",132CH "СУББОТА
jc: Warning - Not able to parse non-standard extensions in the following line:
#EXTINF:-1 group-title="RUS_EXCCCP 1",133CH "ТЕХНО 24
jc: Warning - Not able to parse non-standard extensions in the following line:
#EXTINF:-1 group-title="RUS_EXCCCP 1",134CH "НОСТАЛЬГИЯ
jc: Warning - Not able to parse non-standard extensions in the following line:
#EXTINF:-1 group-title="RUS_EXCCCP 1",137CH " "ОТР "
jc: Warning - Not able to parse non-standard extensions in the following line:
#EXTINF:-1 group-title="RUS_EXCCCP 1",139CH "ТЕАТР " "
jc: Warning - Not able to parse non-standard extensions in the following line:
#EXTINF:-1 group-title="RUS_EXCCCP 1",143CH " "ДОН24 "
jc: Warning - Not able to parse non-standard extensions in the following line:
#EXTINF:-1 group-title="RUS_EXCCCP 1",162CH "RU TV
jc: Warning - Not able to parse non-standard extensions in the following line:
#EXTINF:-1 group-title="RUS_EXCCCP 1",166CH "MTV РОССИЯ
jc: Warning - Not able to parse non-standard extensions in the following line:
#EXTINF:-1 group-title="RUS_EXCCCP 1",170CH " "2X2"
jc: Warning - Not able to parse non-standard extensions in the following line:
#EXTINF:-1 group-title="RUS_EXCCCP 1",171CH "BRIDGE TV РУССКИЙ ХИТ
jc: Warning - Not able to parse non-standard extensions in the following line:
#EXTINF:-1 group-title="RUS_EXCCCP 1",178CH "CINEMA" "
jc: Warning - Not able to parse non-standard extensions in the following line:
#EXTINF:-1 group-title="RUS_EXCCCP 1",182CH "MUSICBOX TV
jc: Warning - Not able to parse non-standard extensions in the following line:
#EXTINF:-1 group-title="RUS_EXCCCP 1",187CH "МУЗЫКА ПЕРВОГО
Unfortunately, there isn't really a great way of fixing those types of issues automatically because many times only a human can figure out where the missing quotation marks need to go. In this case they should go at the end of the line, so it's not too difficult to figure out, but there's no way to know as a general rule. Prob best just to manually fixup those lines.
This parser is now released in version 1.20.4
It must be admitted that, as a test file, I've incidentally got the most frustrating one :-)
Can't say for python; when I struggle with escaping quotes in bash scripting I act according to the importance of maintaining the integrity of the original string.
In this specific case, the importance of preserving the quotation marks on .display
value is zero for me and rather than having to intervene manually I would do a nice tr -d '"'
to get them out of the way altogether.
Otherwise I do
str="${str//$'\u27'/$'\u2bc'}"; str="${str//$'\u22'/$'\u201d'}"
that is, I translate every '
as ʼ
and every "
as ”
and any possible quoting issues disappear without sacrificing automation and without upsetting (indeed: improving) string readability.
It would be interesting to know in this regard the opinion of other users of this parser, now that it has been released...
I am not able to develop it by myself but it would be really very useful to me. Could it be for others as well?