Open dobbelina opened 3 years ago
Thank you very much @dobbelina.
Had a quick look at the document. Should be possible to convert automatically...
D2
C2
says "12 bits", Sony15 if field C2
says "15 bits"H2
up to the dot, if anyH2
after the dot, if any; otherwise -1
C
E
(needs to be normalized)If there is enough interest, one could probably write a Python script that would read the .xls
and write .csv
files for inclusion in this repository.
This does the trick:
# -*- coding: utf-8 -*-
"""SONY.ipynb
Original file is located at
https://colab.research.google.com/drive/1QugHmC4nJIwVCZVbdAQo5eBtliGMoBIh
Convert all SONY codes from Excel to csv suitable for irdb
"""
import pandas as pd
url = 'https://github.com/RogueProeliator/IndigoPlugins-Sony-Network-Remote/raw/bac07e464b559a53ffe11477047894533a24aa37/Documentation/All%20Sony%20Codes%20V1.0.xls'
xl = pd.ExcelFile(url)
sheet_names = xl.sheet_names[7:]
import shutil
try:
shutil.rmtree('/tmp/')
except:
pass
for sheet_name in sheet_names:
df = xl.parse(sheet_name,usecols=[2,4])
df.columns = ['function', 'functionname']
protocol = "Sony" + str(df.iloc[0,0])
category = df.iloc[0,1].strip().replace(" ", "_").replace("_/_", "_").replace("(", "").replace(")", "").replace("/", "_")
df['protocol'] = protocol
device = sheet_name.split(".")[0].lstrip("0")
df['device'] = device
subdevice = "-1"
if len(sheet_name.split(".")) > 1:
subdevice = sheet_name.split(".")[1].lstrip("0")
df['subdevice'] = subdevice
df = df[4:]
df.dropna(inplace=True, subset=['functionname'])
df = df[['functionname', 'protocol', 'device', 'subdevice', 'function']]
df["functionname"] = df.applymap(lambda s:s.upper() if type(s) == str else s)
os.makedirs("/tmp/codes/" + category, exist_ok=True)
with open("/tmp/codes/" + category + "/" + device + "," + subdevice + ".csv", "w") as text_file:
text_file.write(df.to_csv(index=False))
shutil.make_archive("/tmp/codes", 'zip', "/tmp/codes")
from google.colab import files
files.download("/tmp/codes.zip")
Output is attached.
@dobbelina, @bengtmartensson please review the generated CSVs, and compare with what we had so far. These codesets seem to be more complete than what we had but I think some software may trip over the non-generic functionnames.
Wow, that was quick!
I thought of the approach of writing a VBA macro in excel, but your solution as a finished one is offcourse better :smile: Interesting 26,42 is not a TV, it's a projector, https://github.com/probonopd/irdb/blob/master/codes/Sony/TV/26%2C42.csv But off course newer devices after this document was made might "reuse" the same codeset. An example of that can already be seen here 26,1 where 2 protocols are used: https://github.com/probonopd/irdb/blob/master/codes/Sony/TV/26%2C-1.csv
There are some buttons missing in the document, example 151,1 https://github.com/probonopd/irdb/blob/master/codes/Sony/TV/151%2C-1.csv
But overall, if merged with IRDB would make it more complete, and with less errors.
I had a look at the generated csv's, looks super!
@bengtmartensson please review
will take a few days for personal reasons.
No hurries @bengtmartensson. Thank you very much.
Wow, that is an impressive work. (Some other similar resources I list here).
An example of that can already be seen here 26,1 where 2 protocols are used
That is a fundamental problem with the design of IRDB; it cannot model a device using more than one protocol. Projector VPL HW50 uses two for example.
I have found two issues:
While the present content in general uses device types with a space ("Laser Disk
"), the Python script replaces the spaces by underscores ("Laser Disk
"). For uniformity, I suggest keeping the spaces form.
The Excel list uses en-dash (–
Unicode 0x2013) to denote "minus" (like "Speed –"). En-dash is used to denote a range of number (like "pages 123 – 133"), so it is clearly typographically a mistake. Minus in Unicode is "−
numerically 0x2212. Suggestion: in the python script, replace by Unicode minus (if Unicode is considered allowed), or hyphen "-
") is Unicode is not desired.
Thanks for your careful review @bengtmartensson.
@bengtmartensson re. Pioneer
Manufacturer supplied information, mostly not machine parseable
Actually
# Unzip
import zipfile
from urllib.parse import unquote
filename = unquote(os.path.basename(url))
with zipfile.ZipFile(filename, 'r') as zip_ref:
zip_ref.extractall(".")
mhtname = filename.replace(".zip", ".mht")
# Extract mht
# using https://tewarid.github.io/2019/06/04/extract-all-tabular-data-from-multipart-mime-documents.html
# and interpret its contents
!pip install html-table-extractor
import pprint
pp = pprint.PrettyPrinter(indent=2)
import email
from bs4 import BeautifulSoup
from html_table_extractor.extractor import Extractor
with open(mhtname) as fp:
message = email.message_from_file(fp)
for part in message.walk():
if (part.get_content_type() == "text/html"):
soup = BeautifulSoup(part.get_payload(decode=True))
for table in soup.body.find_all("table", recursive=False):
extractor = Extractor(table)
extractor.parse()
results = extractor.return_list()
# print(results)
for result in results:
if(result[0].strip()!=""):
category = result[0].strip().replace("\n", "")
# pp.pprint(result)
# print(len(result))
if(len(result)>2):
if(result[7].strip() != ""):
categoryandfunction = result[1].replace("(", " ").replace(")", " ")
categoryandfunction = ' '.join((category + " " + categoryandfunction.strip().replace("\n", "")).split())
categoryandfunction = categoryandfunction.upper() if type(categoryandfunction) == str else categoryandfunction
print(categoryandfunction + ";" + ' '.join(result[7].strip().replace("\n", "").split()))
results in
CATEGORY FUNCTION;PRONTO HEX
INPUT DVD;0000 0068 0000 0022 0168 00b4 0016 0043 0016 0016 0016 0043 0016 0016 0016 0016 0016 0043 0016 0016 0016 0043 0016 0016 0016 0043 0016 0016 0016 0043 0016 0043 0016 0016 0016 0043 0016 0016 0016 0043 0016 0016 0016 0043 0016 0016 0016 0016 0016 0016 0016 0016 0016 0043 0016 0016 0016 0043 0016 0016 0016 0043 0016 0043 0016 0043 0016 0043 0016 0016 0016 06d9
(...)
which we can further process e.g., with https://github.com/probonopd/decodeir to get
protocol=Pioneer device=165 subdevice=-1 obc=133 hex0=94 hex1=-1 hex2=-1 hex3=-1 misc= error=
which we could then bring into a format suitable for irdb...
Regarding https://github.com/probonopd/irdb/issues/47#issuecomment-742679748
(...)
protocol = "Sony" + str(df.iloc[0,0])
category = df.iloc[0,1].strip().replace(" / ", " ").replace("(", "").replace(")", "").replace("/", " ").replace("–", "-")
# print(category)
(...)
gives
Does this fix the isues correctly @bengtmartensson?
Does this fix the isues correctly @bengtmartensson?
It fixes the first issue, but not the second. It is the command names, not the categories, that are ndashy.
@bengtmartensson re. Pioneer
Manufacturer supplied information, mostly not machine parseable
Well, (almost) everything is parseable if you write a separate parser for every file... ;-).
which we can further process e.g., with https://github.com/probonopd/decodeir to get
Please, do not use decodeir any longer. It has truckloads of problems and is since 10 years a dead end. Use IrpTransmogrifier instead, either from command line or API.
I thought i should share an amazing document i found on github that basically got all Sony codes there is. Well, the document is 4 years old, so up until that time anyway.
Here: All Sony Codes V1.0.xls 7,65 MB
Either use the search macro or click the "Cover Page" tab where all devices are listed according to device number.
All cred to RogueProeliator, as it comes from his/her public repo.