kellyjonbrazil / jc

CLI tool and python library that converts the output of popular command-line tools, file-types, and common strings to JSON, YAML, or Dictionaries. This allows piping of output to tools like jq and simplifying automation scripts.
MIT License
7.91k stars 210 forks source link

Add `--locale` option to specify fallback encoding to decode `data_in` with #553

Open eMPee584 opened 7 months ago

eMPee584 commented 7 months ago

When stating a directory with some latin1-encoded filenames, I was getting this error:

  Traceback (most recent call last):
    File "/usr/bin/jc", line 33, in <module>
      sys.exit(load_entry_point('jc==1.25.1', 'console_scripts', 'jc')())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/lib/python3/dist-packages/jc/cli.py", line 965, in main
      JcCli().run()
    File "/usr/lib/python3/dist-packages/jc/cli.py", line 947, in run
      self._run()
    File "/usr/lib/python3/dist-packages/jc/cli.py", line 911, in _run
      self.standard_parse_and_print()
    File "/usr/lib/python3/dist-packages/jc/cli.py", line 803, in standard_parse_and_print
      self.create_normal_output()
    File "/usr/lib/python3/dist-packages/jc/cli.py", line 755, in create_normal_output
      self.data_out = self.parser_module.parse(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/lib/python3/dist-packages/jc/parsers/stat.py", line 234, in parse
      jc.utils.input_type_check(data)
    File "/usr/lib/python3/dist-packages/jc/utils.py", line 460, in input_type_check
      raise TypeError("Input data must be a 'str' object but is %s.", [str(type(data))])
  TypeError: ("Input data must be a 'str' object but is %s.", ["<class 'bytes'>"])

After figuring out that piping stat's output to iconv -f ISO-8859-1 would solve that but break unicode characters instead, I crudely patched my local jc with this hack:

diff --git jc/cli.py jc/cli.py
index 41c8358d..62c63cf3 100644
--- jc/cli.py
+++ jc/cli.py
@@ -792,7 +792,7 @@ class JcCli():
             if isinstance(self.data_in, bytes):
                 self.data_in = self.data_in.decode('utf-8')
         except UnicodeDecodeError:
-            pass
+            self.data_in = self.data_in.decode('iso-8859-1')

         self.slicer()

I looked into adding a generic --locale option, but as I'm not familiar with the jc code base and there is no other option taking a parameter yet, I'm filing this report instead..

P.S.: .. ouch just realized that now I'll have to invoke a separate jc instance for every single file, which incurs heavy interpreter overhead .. also tried to speed that up with pypy3 but that actually was worse even because very little computation takes place. Ah meh, at least it works™ .. "just" wanted to recursively back up a partition's ctime values.. 😅💦

kellyjonbrazil commented 7 months ago

Thanks for reporting this. I’ll have to think about how this might be addressed in a general fashion.

For cases like these it might make sense to use jc as a python library and create a quick python script to do what you want. (import jc)

The stat parser also comes in a streaming output option (python generator) that can help with memory utilization. (stat_s)