sustrik / uxy

UXY: Adding structure to the UNIX tools
MIT License
699 stars 13 forks source link

Users and groups with spaces aren't handled properly #7

Closed mdhowle closed 5 years ago

mdhowle commented 5 years ago

I'm in an environment where groups with spaces are common. The regular expression that extracts the output of ls cannot handle this properly. Since there are no delimiters for user and group names, it would not be possible to get the correct user and group.

For example, this is the output with spaces in the user and group names: (users: howlem and "test user ignore" and groups: "domain users")

TYPE PERMISSIONS LINKS OWNER      GROUP      SIZE         TIME                                  NAME 
d    rwxr-xr-x   4     howlem     domain     users        "4096 2019-05-23 14:17:07.493736735"  "-0400 debian" 
-    rw-r--r--   1     howlem     domain     users        "12837 2019-05-23 11:49:51.750633915" "-0400 README.md" 
-    rw-r--r--   1     test       user       ignore       "domain users     0"                  "2019-05-23 15:28:01.361719083 -0400 test" 
-    rw-r--r--   1     howlem     domain     users        "0 2019-05-23 15:49:04.809507672"     "-0400 test2" 
-    rw-r--r--   1     howlem     domain     users        "853 2019-05-24 10:43:18.835875117"   "-0400 test.py" 
-    rwxr-xr-x   1     howlem     domain     users        "21838 2019-05-28 10:24:17.890430740" "-0400 uxy" 

I wrote a quick implementation of ls. It does lose the extra ls arguments and the date/time format isn't the same.


import datetime
import os
import pwd
import grp

def uxy_cmd_ls(args, uargs):
  uargs =  uargs[0] if uargs else "."
  content = os.listdir(uargs)
  fmt = Format("TYPE PERMISSIONS LINKS OWNER      GROUP      SIZE         TIME                                  NAME") 
  writeout(fmt.render())

  for entry in content:
      fields = []
      fstats = os.stat(os.path.join(uargs, entry))
      mode = stat.filemode(fstats.st_mode)
      fields.append(mode[0])
      fields.append(mode[1:])
      fields.append(str(fstats.st_nlink))

      uid, gid = (fstats.st_uid, fstats.st_gid)
      try:
        user = pwd.getpwuid(uid).pw_name
      except KeyError:
        user = str(uid)

      fields.append(encode_field(user))

      try:
        group = grp.getgrgid(gid).gr_name
      except KeyError:
        group = str(gid)
      fields.append(encode_field(group))

      fields.append(str(fstats.st_size))
      fields.append(encode_field(datetime.datetime.fromtimestamp(fstats.st_mtime).isoformat()))
      fields.append(encode_field(entry))
      writeout(fmt.render(fields))
`
sustrik commented 5 years ago

Hm, uxy is meant to reformat the output of existing tools rather than provide tools of its own.

I guess calling ls with -b option would make spaces deterministically parsable.

mdhowle commented 5 years ago

From what I can tell -b only escapes the file name.

I agree, perhaps a better workaround rather than re-implementing ls would be to use -n, --numeric-uid-gid with some minimal post-processing. The numeric ids can be resolved using the built-in pwd and grp modules. If -n was passed by the user, no resolving should be done. -n is supported by other UNIXes.

def uxy_cmd_ls(args, uargs):
  resolve_ids = True
  if "-n" in uargs or "--numeric-uid-gid" in uargs:
    resolve_ids = False
  proc = subprocess.Popen(
    ['ls','-l', '--time-style=full-iso', '-n'] + uargs,
    stdout=subprocess.PIPE)
  regexp = re.compile(r'(.)([^\s]*)\s+([^\s]*)\s+([^\s]*)\s+([^\s]*)\s+([^\s]*)\s+([^\s]*\s+[^\s]*\s+[^\s]*)\s+(.*)')
  fmt = Format("TYPE PERMISSIONS LINKS OWNER      GROUP      SIZE         TIME                                  NAME")
  writeout(fmt.render())
  for ln in proc.stdout:
    ln = trim_newline(ln.decode("utf-8"))
    if ln.startswith('total'):
      continue
    m = regexp.match(ln)
    if not m:
      continue
    fields = []
    for i in range(1, regexp.groups + 1):
      value = m.group(i)
      if resolve_ids:
        try:
          if i == 4:
            value = pwd.getpwuid(int(value)).pw_name
          elif i == 5:
            value = grp.getgrgid(int(value)).gr_name
        except (KeyError, ValueError):
          pass
      fields.append(encode_field(value))
    writeout(fmt.render(fields))
mdhowle commented 5 years ago

If that implementation is ok, I can make a pull request.

mdhowle commented 5 years ago

Thanks!