Closed tech-savvy-guy closed 2 years ago
Hi and welcome @TECH-SAVVY-GUY !
I can try to help, but could you provide some minimal Python code reproducing your problem please? Without knowing what methods are called with what values, I cannot do much...
Thanks for offering help @Lucas-C !
Actually, I am creating a Telegram Bot
that uses the Firebase API as it's database framework.
Now, I want to store all the user interactions with my bot as logs
on my database.
I have already shared the general schema
of my database.
Now, the problem is that various Telegram users have various kind of Names
, with different fonts
(to look fancy π
).
For example: This is my username β€ π’πΈπ±πͺπΆ ππͺπ½π½πͺ
The above code uploads the logs
to my database.
Now the function below retrieves the logs
from the database and generates a PDF FILE
using the FPDF2
module.
def send_logs():
headers, data = ["NAME", "CHAT ID", "USERNAME", "COMMAND", "TIME", "DATE"], []
logs = database.child("Users").child("User Logs").get()
for log in logs.each():
_chat_id_ = str(log.val()["chat_id"])
_username_ = log.val()["username"]
_name_ = log.val()["name"]
_time_ = log.val()["time"]
_date_ = log.val()["date"]
_cmd_ = log.val()["command"]
data.append([_name_, _chat_id_, _username_, _cmd_, _time_, _date_])
pdf = FPDF()
pdf.add_font("Roboto", "",
"rs_normal.ttf", uni=True)
pdf.add_font("Roboto", "B",
"rs_bold.ttf", uni=True)
pdf.add_page()
pdf.set_font("Roboto", "B", size=10)
line_height = pdf.font_size * 2.5
col_width = [pdf.epw / 6]
col_width_list = [30, 10, 25, 16, 10, 10]
for index, attr in enumerate(headers):
col_width = (col_width_list[index] * pdf.epw) // 100
pdf.multi_cell(col_width, line_height, attr, align="C", border=1, ln=3, max_line_height=pdf.font_size)
pdf.ln(line_height)
pdf.set_font("Roboto", size=8)
for row in data:
for index, datum in enumerate(row):
col_width = (col_width_list[index] * pdf.epw) // 100
pdf.multi_cell(col_width, line_height, datum, align="C", border=1, ln=3, max_line_height=pdf.font_size)
pdf.ln(line_height)
pdf.output('logs.pdf')
Now, this produces a PDF as shown in my initial comment.
Only problem is that the Name
field is not displayed properly. I think this has something to do with the font I am using to generate the PDF File
. I am using the Roboto Slab
font.
I hope this information is sufficient. Do let me know, if you need anything else...
Now, the problem is that various Telegram users have various kind of
Names
, with differentfonts
(to look fancy π ). For example: This is my username β€ π’πΈπ±πͺπΆ ππͺπ½π½πͺ
Your name tag is composed of rather exotic unicode characters:
β€ | U+10148 | Black Rightwards Arrowhead |
U+32 | ASCII space | |
π’ | U+120034 | Mathematical Bold Script Capital S |
πΈ | U+120056 | Mathematical Bold Script Small O |
π± | U+120049 | Mathematical Bold Script Small H |
πͺ | U+120042 | Mathematical Bold Script Small A |
πΆ | U+120054 | Mathematical Bold Script Small M |
U+32 | ASCII space | |
π | U+120019 | Mathematical Bold Script Capital D |
πͺ | U+120042 | Mathematical Bold Script Small A |
π½ | U+120061 | Mathematical Bold Script Small T |
π½ | U+120061 | Mathematical Bold Script Small T |
πͺ | U+120042 | Mathematical Bold Script Small A |
The first one is from the Unicode subset "Dingbats", the others (besides the spaces) from "Mathematical Alphanumeric Symbols". Most normal fonts won't contain glyphs for those characters, so they are displayed as little rectangles. If you want them to display correctly, you'll have to analyze each one, find the respective fonts that can handle them, and add them to the PDF using those fonts.
Modern webbrowsers have this functionality built in, so that you can see them correctly on this page here. The Telegram client apparently does the same. But you can't really expect that from a self-described "simple" PDF library, so you'll have to roll your own solution for that.
Oh, and if you manage to figure out a general and complete solution, maybe you could contribute that as an extension to this project? :wink:
Btw.: A possible alternative approach would be to figure out the original characters that were substituted by those symbols. As long as only Mathematical Alphanumerical Symbols are involved, his would be a relatively short lookup table. But given the huge number of writing sytems supported by Unicode, finding a suitable font might be easier in the general case.
@allcontributors please add @gmischler for question
@Lucas-C
I've put up a pull request to add @gmischler! :tada:
This is really a perfect answer @gmischler, thank you!
I have some good news though: this feature is built-in in Python:
>>> import unicodedata
>>> unicodedata.normalize('NFKD', str)
'Soham Datta'
Ah, so my conclusion at the end was only partially correct. I was aware of Pythons unicodedata, but didn't remember how powerful the normalize function was. As long as your users write their names in something that is based on the latin script, this should indeed do the trick.
As soon as someone decides to write their name in eg. hindi though, you'll still need to find a font containing those characters...
An now I remember something else I hadn't originally thought about: There are some open source fonts that cover a wide range of unicode characters. A popular one is for eg. GNU Unifont. If you use both normalization and a font like that, you might have covered most of your bases, and only rarely encounter one of those replacement rectangles.
Unless you want to preserve your "fancy styles". Then you'd have to play the font game all the way...
This is really a perfect answer @gmischler, thank you!
I have some good news though: this feature is built-in in Python:
>>> import unicodedata >>> unicodedata.normalize('NFKD', str) 'Soham Datta'
Thanks for providing a solution! May I ask, what is the purpose of the "NKFD"
arguement in the normalize
function?
I looked up the online documentation here, but didn't quite understand! π
Ah, so my conclusion at the end was only partially correct. I was aware of Pythons unicodedata, but didn't remember how powerful the normalize function was. As long as your users write their names in something that is based on the latin script, this should indeed do the trick.
As soon as someone decides to write their name in eg. hindi though, you'll still need to find a font containing those characters...
An now I remember something else I hadn't originally thought about: There are some open source fonts that cover a wide range of unicode characters. A popular one is for eg. GNU Unifont. If you use both normalization and a font like that, you might have covered most of your bases, and only rarely encounter one of those replacement rectangles.
Unless you want to preserve your "fancy styles". Then you'd have to play the font game all the way...
Thanks for the reply @gmischler !
I have one question: I am using the Roboto Slab
font to create my PDF
files.
Now, isn't Roboto Slab
a google font?
So, technically it should cover all the unicode characters
?
Actually, I was thinking of something else for this issue.
Let's assume we are creating a table
in an Excel Sheet
.
Now, there is a default font
that applies to all the cells in the document, right?
But say, I want to edit a cell in particular. That particular cell can have a different font
, right?
Now, what if we use this idea and make all the cells under the Name
column font independent
?
It's as if, we are using CTRL+C
and CTRL+V
for entering the data there. So there is no particular font that we have specified, and hence there will be no οΏ½ in the PDF
generated...
Can this method be implemented in any way?
May I ask, what is the purpose of the "NKFD" arguement in the normalize function?
That controls the different ways how combined (usually accented) characters are handled. The Python manual gives a few examples.
I have one question: I am using the
Roboto Slab
font to create myRoboto Slab
a google font? So, technically it should cover all theunicode characters
?
What does being a "Google font" (technically: commissioned by Google) have to do with the selection of glyphs covered? Very few fonts include dingbats and mathematical symbols, anyway. And if a character is shown as a rectangle, then it very obviously isn't included.
Now, what if we use this idea and make all the cells under the Name column font independent?
There can be no "font independent cells", neither in excel nor in a PDF. Any text necessarily has to have a font assigned (the "current font" in fpdf). If you CTRL+V
in excel, then the font assignment of the originating cell is just copied together with the text. How you automate something like that in Python is up to you, though I suspect that neither Telegram nor your database will tell you which font to use. You'll have to figure that out with the help of the unicodedata
library module and maybe some additional data.
Alright, thanks for clearing out my questions! π
Hi. I am having an issue with FPDF2.
I have a database as shown below.
Now, I want to create a pdf using these data. However, the username field in the database has a different encoding. When the PDF is created everything is working except for the username encoding problem.
Can anyone help?