probonopd / irdb

One of the largest crowd-sourced, manufacturer-independent databases of infrared remote control codes on the web, and aspiring to become the most comprehensive and most accurate one
Other
798 stars 148 forks source link

All Sony codes here #47

Open dobbelina opened 3 years ago

dobbelina commented 3 years ago

I thought i should share an amazing document i found on github that basically got all Sony codes there is. Well, the document is 4 years old, so up until that time anyway.

Here: All Sony Codes V1.0.xls 7,65 MB

Either use the search macro or click the "Cover Page" tab where all devices are listed according to device number.

All cred to RogueProeliator, as it comes from his/her public repo.

probonopd commented 3 years ago

Thank you very much @dobbelina.

Had a quick look at the document. Should be possible to convert automatically...

If there is enough interest, one could probably write a Python script that would read the .xls and write .csv files for inclusion in this repository.

probonopd commented 3 years ago

This does the trick:

# -*- coding: utf-8 -*-
"""SONY.ipynb

Original file is located at
    https://colab.research.google.com/drive/1QugHmC4nJIwVCZVbdAQo5eBtliGMoBIh

Convert all SONY codes from Excel to csv suitable for irdb
"""

import pandas as pd

url = 'https://github.com/RogueProeliator/IndigoPlugins-Sony-Network-Remote/raw/bac07e464b559a53ffe11477047894533a24aa37/Documentation/All%20Sony%20Codes%20V1.0.xls'

xl = pd.ExcelFile(url)
sheet_names = xl.sheet_names[7:]

import shutil
try:
  shutil.rmtree('/tmp/')
except:
  pass

for sheet_name in sheet_names:
  df = xl.parse(sheet_name,usecols=[2,4])
  df.columns = ['function', 'functionname']
  protocol = "Sony" + str(df.iloc[0,0])
  category = df.iloc[0,1].strip().replace(" ", "_").replace("_/_", "_").replace("(", "").replace(")", "").replace("/", "_")
  df['protocol'] = protocol
  device = sheet_name.split(".")[0].lstrip("0")
  df['device'] = device
  subdevice = "-1"
  if len(sheet_name.split(".")) > 1:
    subdevice = sheet_name.split(".")[1].lstrip("0")
  df['subdevice'] = subdevice

  df = df[4:]
  df.dropna(inplace=True, subset=['functionname'])

  df = df[['functionname',  'protocol', 'device',   'subdevice',    'function']]

  df["functionname"] = df.applymap(lambda s:s.upper() if type(s) == str else s)

  os.makedirs("/tmp/codes/" + category,  exist_ok=True)
  with open("/tmp/codes/" + category + "/" + device + "," + subdevice + ".csv", "w") as text_file:
      text_file.write(df.to_csv(index=False))

shutil.make_archive("/tmp/codes", 'zip', "/tmp/codes")

from google.colab import files
files.download("/tmp/codes.zip")

Output is attached.

SONY.zip

probonopd commented 3 years ago

@dobbelina, @bengtmartensson please review the generated CSVs, and compare with what we had so far. These codesets seem to be more complete than what we had but I think some software may trip over the non-generic functionnames.

dobbelina commented 3 years ago

Wow, that was quick!

I thought of the approach of writing a VBA macro in excel, but your solution as a finished one is offcourse better :smile: Interesting 26,42 is not a TV, it's a projector, https://github.com/probonopd/irdb/blob/master/codes/Sony/TV/26%2C42.csv But off course newer devices after this document was made might "reuse" the same codeset. An example of that can already be seen here 26,1 where 2 protocols are used: https://github.com/probonopd/irdb/blob/master/codes/Sony/TV/26%2C-1.csv

There are some buttons missing in the document, example 151,1 https://github.com/probonopd/irdb/blob/master/codes/Sony/TV/151%2C-1.csv

But overall, if merged with IRDB would make it more complete, and with less errors.

I had a look at the generated csv's, looks super!

bengtmartensson commented 3 years ago

@bengtmartensson please review

will take a few days for personal reasons.

probonopd commented 3 years ago

No hurries @bengtmartensson. Thank you very much.

bengtmartensson commented 3 years ago

Wow, that is an impressive work. (Some other similar resources I list here).

An example of that can already be seen here 26,1 where 2 protocols are used

That is a fundamental problem with the design of IRDB; it cannot model a device using more than one protocol. Projector VPL HW50 uses two for example.

I have found two issues:

probonopd commented 3 years ago

Thanks for your careful review @bengtmartensson.

probonopd commented 3 years ago

@bengtmartensson re. Pioneer

Manufacturer supplied information, mostly not machine parseable

Actually

# Unzip
import zipfile
from urllib.parse import unquote

filename = unquote(os.path.basename(url))
with zipfile.ZipFile(filename, 'r') as zip_ref:
    zip_ref.extractall(".")

mhtname = filename.replace(".zip", ".mht")

# Extract mht
# using https://tewarid.github.io/2019/06/04/extract-all-tabular-data-from-multipart-mime-documents.html
# and interpret its contents
!pip install html-table-extractor
import pprint
pp = pprint.PrettyPrinter(indent=2)
import email
from bs4 import BeautifulSoup
from html_table_extractor.extractor import Extractor
with open(mhtname) as fp:
    message = email.message_from_file(fp)
    for part in message.walk():
        if (part.get_content_type() == "text/html"):
            soup = BeautifulSoup(part.get_payload(decode=True))
            for table in soup.body.find_all("table", recursive=False):
                extractor = Extractor(table)
                extractor.parse()
                results = extractor.return_list()
                # print(results)
                for result in results:
                  if(result[0].strip()!=""):
                    category = result[0].strip().replace("\n", "")
                    # pp.pprint(result)
                  # print(len(result))
                  if(len(result)>2):
                    if(result[7].strip() != ""):
                      categoryandfunction = result[1].replace("(", " ").replace(")", " ")
                      categoryandfunction = ' '.join((category + " " + categoryandfunction.strip().replace("\n", "")).split())
                      categoryandfunction = categoryandfunction.upper() if type(categoryandfunction) == str else categoryandfunction
                      print(categoryandfunction + ";" + ' '.join(result[7].strip().replace("\n", "").split()))

results in

CATEGORY FUNCTION;PRONTO HEX
INPUT DVD;0000 0068 0000 0022 0168 00b4 0016 0043 0016 0016 0016 0043 0016 0016 0016 0016 0016 0043 0016 0016 0016 0043 0016 0016 0016 0043 0016 0016 0016 0043 0016 0043 0016 0016 0016 0043 0016 0016 0016 0043 0016 0016 0016 0043 0016 0016 0016 0016 0016 0016 0016 0016 0016 0043 0016 0016 0016 0043 0016 0016 0016 0043 0016 0043 0016 0043 0016 0043 0016 0016 0016 06d9
(...)

which we can further process e.g., with https://github.com/probonopd/decodeir to get

protocol=Pioneer device=165 subdevice=-1 obc=133 hex0=94 hex1=-1 hex2=-1 hex3=-1 misc= error=

which we could then bring into a format suitable for irdb...

probonopd commented 3 years ago

Regarding https://github.com/probonopd/irdb/issues/47#issuecomment-742679748

  (...)
  protocol = "Sony" + str(df.iloc[0,0])
  category = df.iloc[0,1].strip().replace(" / ", " ").replace("(", "").replace(")", "").replace("/", " ").replace("–", "-")
  # print(category)
  (...)

gives

codes.zip

Does this fix the isues correctly @bengtmartensson?

bengtmartensson commented 3 years ago

Does this fix the isues correctly @bengtmartensson?

It fixes the first issue, but not the second. It is the command names, not the categories, that are ndashy.

bengtmartensson commented 3 years ago

@bengtmartensson re. Pioneer

Manufacturer supplied information, mostly not machine parseable

Well, (almost) everything is parseable if you write a separate parser for every file... ;-).

which we can further process e.g., with https://github.com/probonopd/decodeir to get

Please, do not use decodeir any longer. It has truckloads of problems and is since 10 years a dead end. Use IrpTransmogrifier instead, either from command line or API.

bengtmartensson commented 2 years ago

I have created a Girr version of said file, found here.

In IrScrutinizer it looks like this:

Screenshot from 2021-08-01 20-17-25