Closed me-suzy closed 2 years ago
As already stated in this comment:
please share some fully-autonomous minimal reproducible example so that we can replicate your problem.
If you do not provide us with some minimal Python code, we won't be able to help you much.
I was able to execute the following code without reproducing the issue you mentioned:
from fpdf import fpdf, html
class PDF(fpdf.FPDF, html.HTMLMixin):
pass
pdf = PDF()
pdf.add_page()
pdf.add_font("Kanit", fname="fonts/Kanit-Regular.ttf")
pdf.add_font("Kanit", style="I", fname="fonts/Kanit-Italic.ttf")
pdf.set_font("Kanit", size=24)
pdf.write_html('<p class="text_obisnuit">Intr-un articol precedent, <a href="https://neculaifantanaru.com/dupa-toate-regulile-artei.html"><em>Dupa toate regulile artei</em></a>, v-am povestit despre tanarul print Hamlet</p>')
pdf.output("issue_498.pdf")
This is the complete PYTHON code.
1. It must also be taken into account that the signs : are lost in PDF, also the uppercase letter at the beginning of the line:
For exemple:
Leadership: Takes into account the opinions of others in order to understand...
in PDF looks like this:
Leadership takes into account the opinions of others in order to understand...
2. Link problem as I showed above.
3. The tag into the paragraph, as I showed in the previous bug.
HTML:
<p class="text_obisnuit2"><em>My Name is Prince</em></p>
IN PDF the second tag is still there, like this.
<em>My Name is Prince</em>
Here is an example of one of my html pages. Copy it on a html file, and test it. You can duplicate this html code in many pages you want, because I made a merge PDF also in python code (that works great)
https://hastebin.com/puxecelivi.http
MY PYTHON CODE:
from fpdf import fpdf, html
import os
import re
from PyPDF2 import PdfFileMerger
def read_text_from_file(file_path):
"""
Aceasta functie returneaza continutul unui fisier.
file_path: calea catre fisierul din care vrei sa citesti
"""
with open(file_path, encoding='utf8', errors='ignore') as f:
text = f.read()
f.close()
return text
def write_to_file(text, file_path):
"""
Aceasta functie scrie un text intr-un fisier.
text: textul pe care vrei sa il scrii
file_path: calea catre fisierul in care vrei sa scrii
"""
with open(file_path, 'wb') as f:
f.write(text.encode('utf8', 'ignore'))
f.close()
dict_simboluri = dict()
dict_simboluri['ă'] = 'a'
dict_simboluri['â'] = 'a'
def save_to_pdf(directory_path):
for root, dirs, files in os.walk(directory_path):
for file_name in files:
if file_name.endswith(".html"):
file_path = root + os.sep + file_name
file_content = read_text_from_file(file_path)
# creare fisier PDF
class PDF(fpdf.FPDF, html.HTMLMixin):
pass
pdf = PDF()
pdf.add_page()
pdf.set_font('helvetica', size=12)
# extras denumire articol
den_articol = re.search('<td><h1 class="den_articol" itemprop="name">(.*?)</h1></td>', file_content)
if (den_articol == None):
print("Nu am gasit --- denumire articol --- in fisierul --- {} ---.".format(file_path))
else:
den_articol = den_articol.group(1)
for simbol in dict_simboluri.keys():
den_articol = den_articol.replace(simbol, dict_simboluri[simbol])
pdf.set_text_color(204, 0, 0) # rosu
pdf.set_font('helvetica', size=14, style="B")
pdf.multi_cell(w=190, txt=den_articol)
pdf.ln()
pdf.set_font('helvetica', size=12)
# extras data
date = re.search('<td class="text_dreapta">(.*?), in <a', file_content)
if (date == None):
print("Nu am gasit --- date --- in fisierul --- {} ---.".format(file_path))
else:
date = date.group(1)
pdf.set_text_color(0, 102, 204) # albastru
pdf.set_font('helvetica', size=8, style="B")
pdf.cell(txt=date)
pdf.ln()
pdf.ln()
pdf.ln()
pdf.ln()
pdf.set_text_color(0, 0, 0) # negru (default)
pdf.set_font('helvetica', size=12)
# extras text
articol = re.search('<!-- ARTICOL START -->([\s\S]*?)<!-- ARTICOL FINAL -->', file_content)
if (articol == None):
print("Nu am gasit --- ARTICOL START/FINAL --- in fisierul --- {} ---.".format(file_path))
else:
articol = articol.group(1)
articol = articol.replace(""", "\"")
articol = articol.replace("’", "'")
# paragraphs
par_regex = re.compile('<p class="text_obisnuit.*?">.*?</p>')
pars = re.findall(par_regex, articol)
pars_text = list()
if (len(pars) == 0):
print("Nu am gasit -- paragrafe text_obisnuit -- in fisierul --- {} ---.".format(file_path))
else:
for i in range(0, len(pars)):
if ('<p class="text_obisnuit">' in pars[i]):
# identificam clasa text_obisnuit si preluam textul
content = re.findall('<p class="text_obisnuit">(.*?)</p>', pars[i])
if (len(content) == 0):
print("Nu am gasit text in paragraful {}, fisierul {}.".format(pars[i], file_path))
else:
# punem textul intr-o celula multi_cell
for simbol in dict_simboluri.keys():
content[0] = content[0].replace(simbol, dict_simboluri[simbol])
pars_text.append(content[0])
pdf.multi_cell(w=190, txt = content[0])
# adaugam linie goala intre paragrafe
pdf.ln();
elif ('<p class="text_obisnuit2">' in pars[i]):
# identificam clasa text_obisnuit2 si preluam textul
content = re.findall('<p class="text_obisnuit2">(.*?)</p>', pars[i])
if (len(content) == 0):
print("Nu am gasit text in paragraful {}, fisierul {}.".format(pars[i], file_path))
else:
# setam fontul cu bold
pdf.set_font('helvetica', size=12, style="B")
# punem textul intr-o celula multi_cell
for simbol in dict_simboluri.keys():
content[0] = content[0].replace(simbol, dict_simboluri[simbol])
pars_text.append(content[0])
pdf.multi_cell(w=190, txt = content[0])
# adaugam linie goala intre paragrafe
pdf.ln();
# resetam fontul
pdf.set_font('helvetica', size=12)
else:
continue
# adaugare link
pdf.ln()
pdf.ln()
pdf.set_font('helvetica', size=12, style="B")
pdf.cell(txt="Source:")
pdf.set_font('helvetica', size=12)
pdf.set_text_color(0, 102, 204) # albastru
pdf.cell(w=40, txt="https://neculaifantanaru.com/{}".format(file_name), link="https://neculaifantanaru.com/{}".format(file_name))
den_fisier = file_path.split('.')[0] + '.pdf'
pdf.output(den_fisier)
# break;
# functie care face merge la mai multe fisiere pdf
def merge_pdf_files(directory_path):
merger = PdfFileMerger()
for root, dirs, files in os.walk(directory_path):
for file_name in files:
if file_name.endswith(".pdf"):
print("PDF: ", file_name)
file_path = root + os.sep + file_name
merger.append(file_path)
merger.write(root + os.sep + "articles.pdf")
merger.close()
break;
save_to_pdf("c:\\Folder5\\")
merge_pdf_files("c:\\Folder5\\")
Hi @me-suzy! If I understood correctly, the issue arises because you don't seem to use pdf.write_html()
For example, with this code
def issue():
file_path = "./issue.html"
file_content = read_text_from_file(file_path)
# creare fisier PDF
class PDF(fpdf.FPDF, html.HTMLMixin):
pass
pdf = PDF()
pdf.add_page()
# extras denumire articol
den_articol = re.search('<td><h1 class="den_articol" itemprop="name">(.*?)</h1></td>', file_content)
if (den_articol == None):
print("Nu am gasit --- denumire articol --- in fisierul --- {} ---.".format(file_path))
else:
den_articol = den_articol.group(1)
for simbol in dict_simboluri.keys():
den_articol = den_articol.replace(simbol, dict_simboluri[simbol])
pdf.set_text_color(204, 0, 0) # rosu
pdf.add_font("Kanit", fname="fonts/Kanit-Regular.ttf")
pdf.set_font('Kanit', size=14)
pdf.multi_cell(w=190, txt=den_articol)
pdf.ln()
pdf.output("issue.pdf")
the header is shown like this Instead if I change
pdf.multi_cell(w=190, txt=den_articol)
to
pdf.write_html(text=f'<h1 class="den_articol" itemprop="name">{den_articol}</h1>')
the header seems to be shown correctly
Pay attention also that with the helvetica
font, fpdf2
complained that helvetica
doesn't support the ş
character and I had to switch to Kanit
Thank you for jumping in with this great answer @RedShy!
@all-contributors please add @RedShy for question
@Lucas-C
I've put up a pull request to add @RedShy! :tada:
It is not about the html TITLE tag. It is about the tags from paragraph. See this. I pointed the problem:
I change, but is exactly the same thing. You see at my title the diacritical marks ş
from Abracadabra, cine eşti
This is why I made a dict_simboluri
at the beginnind of the Python code, as to transform automaticaly ş
into ă
@me-suzy
I've put up a pull request to add @RedShy! :tada:
For rendering correctly also the paragraphs you should change the respectives lines at well. For example in this section
# identificam clasa text_obisnuit si preluam textul
content = re.findall('<p class="text_obisnuit">(.*?)</p>', pars[i])
if (len(content) == 0):
print("Nu am gasit text in paragraful {}, fisierul {}.".format(pars[i], file_path))
else:
# punem textul intr-o celula multi_cell
for simbol in dict_simboluri.keys():
content[0] = content[0].replace(simbol, dict_simboluri[simbol])
pars_text.append(content[0])
pdf.multi_cell(w=190, txt = content[0])
I changed
pdf.multi_cell(w=190, txt = content[0])
to
pdf.write_html(text=f'<p class="text_obisnuit">{content[0]}</p>')
and in this other section
# identificam clasa text_obisnuit2 si preluam textul
content = re.findall('<p class="text_obisnuit2">(.*?)</p>', pars[i])
if (len(content) == 0):
print("Nu am gasit text in paragraful {}, fisierul {}.".format(pars[i], file_path))
else:
# setam fontul cu bold
pdf.set_font('Kanit', size=12, style="B")
# punem textul intr-o celula multi_cell
for simbol in dict_simboluri.keys():
content[0] = content[0].replace(simbol, dict_simboluri[simbol])
pars_text.append(content[0])
pdf.multi_cell(w=190, txt = content[0])
I changed
pdf.multi_cell(w=190, txt = content[0])
to
pdf.write_html(text=f'<p class="text_obisnuit2">{content[0]}</p>')
A segment of resulting PDF that I obtain is this
In general if you want to add html tags to the PDF you have to use the pdf.write_html()
function.
I used the latest version of fpdf2
installed executing pip install git+https://github.com/PyFPDF/fpdf2.git@master
If you have any more doubts, feel free to keep asking!
I made those 2 change, and I get this error:
also, I get the second error, after change the second line of yours:
Providing a screenshot of your IDE with a line of code in red is not very helpful... A full error stacktrace would be a lot more useful.
Also, you did not provide any minimal code associated with the last errors you faced: how do you expect us to help you without sharing the underlying code triggering the problem?
Other fpdf2
contributors may have suggestions to help you, and I thank them for their patience and will to help!
As for myself, I'm sorry but I won't try to figure out what the problem is without seeing any code, nor take the time to read through all the previous 150+ lines of code you provided. The idea of a writing a minimal code sample reproducing the problem is that you take the time to narrow the issue to something "atomic", easy to analyze and reason upon, before asking other people for help. You can find more information about how to proceed there: https://stackoverflow.com/help/minimal-reproducible-example
I'll be glad to help you if you take the time to provide a minimal reproducible example and the associated full stacktrace
C:\Users\Castel\AppData\Roaming\Python\Python310\site-packages\fpdf\fpdf.py:1904: UserWarning: Substituting font arial by core font helvetica
warnings.warn(
PDF: abordarea-frontala-a-lucrurilor-neelucidate.pdf
PDF: abracadabra-cine-esti.pdf
PDF: accente-pronuntate-in-leadership.pdf
>>>
*** Remote Interpreter Reinitialized ***
C:\Users\Castel\AppData\Roaming\Python\Python310\site-packages\fpdf\fpdf.py:1904: UserWarning: Substituting font arial by core font helvetica
warnings.warn(
Traceback (most recent call last):
File "C:\Folder5\Convert all html to PDF in a single book - BEBE.py", line 281, in <module>
save_to_pdf("c:\\Folder5\\")
File "C:\Folder5\Convert all html to PDF in a single book - BEBE.py", line 226, in save_to_pdf
pdf.write_html(text=f'<p class="text_obisnuit">{content[0]}</p>')
File "C:\Users\Castel\AppData\Roaming\Python\Python310\site-packages\fpdf\html.py", line 736, in write_html
h2p.feed(text)
File "C:\Program Files\Python39\lib\html\parser.py", line 110, in feed
self.goahead(0)
File "C:\Program Files\Python39\lib\html\parser.py", line 170, in goahead
k = self.parse_starttag(i)
File "C:\Program Files\Python39\lib\html\parser.py", line 344, in parse_starttag
self.handle_starttag(tag, attrs)
File "C:\Users\Castel\AppData\Roaming\Python\Python310\site-packages\fpdf\html.py", line 447, in handle_starttag
self.href = attrs["href"]
KeyError: 'href'
So, I change all styles ARIAL, TIMES, KANIT, I get the same error:
fpdf.errors.FPDFUnicodeEncodingException: Character "ă" at index 45 in text is outside the range of characters supported by the font used: "helvetica". Please consider using a Unicode font.
AFTER UPDATE MY CODE WITH NEW FONT and modify those 2 lines, I get this error (I didn't have thise error before the change):
*** Remote Interpreter Reinitialized ***
C:\Users\Castel\AppData\Roaming\Python\Python310\site-packages\fpdf\fpdf.py:1799: UserWarning: Core font or font already added 'kanit': doing nothing
warnings.warn(f"Core font or font already added '{fontkey}': doing nothing")
Traceback (most recent call last):
File "C:\Folder5\Convert all html to PDF in a single book - BEBE.py", line 175, in <module>
save_to_pdf("c:\\Folder5\\")
File "C:\Folder5\Convert all html to PDF in a single book - BEBE.py", line 65, in save_to_pdf
pdf.set_font('Kanit', size=14, style="B")
File "C:\Users\Castel\AppData\Roaming\Python\Python310\site-packages\fpdf\fpdf.py", line 1931, in set_font
raise FPDFException(
fpdf.errors.FPDFException: Undefined font: kanitB - Use built-in fonts or FPDF.add_font() beforehand
>>>
THIS IS MY LAST VERSION OF PYTHON CODE:
from fpdf import fpdf, html
import os
import re
from PyPDF2 import PdfFileMerger
def read_text_from_file(file_path):
"""
Aceasta functie returneaza continutul unui fisier.
file_path: calea catre fisierul din care vrei sa citesti
"""
with open(file_path, encoding='utf8', errors='ignore') as f:
text = f.read()
f.close()
return text
def write_to_file(text, file_path):
"""
Aceasta functie scrie un text intr-un fisier.
text: textul pe care vrei sa il scrii
file_path: calea catre fisierul in care vrei sa scrii
"""
with open(file_path, 'wb') as f:
f.write(text.encode('utf8', 'ignore'))
f.close()
dict_simboluri = dict()
dict_simboluri['ă'] = 'a'
dict_simboluri['â'] = 'a'
dict_simboluri['ã'] = 'a'
dict_simboluri['â'] = 'a'
dict_simboluri['ă'] = 'a'
dict_simboluri['â'] = 'a'
def save_to_pdf(directory_path):
for root, dirs, files in os.walk(directory_path):
for file_name in files:
if file_name.endswith(".html"):
file_path = root + os.sep + file_name
file_content = read_text_from_file(file_path)
# creare fisier PDF
class PDF(fpdf.FPDF, html.HTMLMixin):
pass
pdf = PDF()
pdf.add_font("Kanit", fname="fonts/Kanit-Regular.ttf")
pdf.add_font("Kanit", fname="fonts/Kanit-Bold.ttf")
pdf.add_font("Kanit", style="I", fname="fonts/Kanit-Italic.ttf")
pdf.set_font("Kanit", size=24)
# extras denumire articol
den_articol = re.search('<td><h1 class="den_articol" itemprop="name">(.*?)</h1></td>', file_content)
if (den_articol == None):
print("Nu am gasit --- denumire articol --- in fisierul --- {} ---.".format(file_path))
else:
den_articol = den_articol.group(1)
for simbol in dict_simboluri.keys():
den_articol = den_articol.replace(simbol, dict_simboluri[simbol])
pdf.set_text_color(204, 0, 0) # rosu
pdf.set_font('Kanit', size=14, style="B")
pdf.multi_cell(w=190, txt=den_articol)
pdf.ln()
pdf.set_font('Kanit', size=12)
# extras data
date = re.search('<td class="text_dreapta">(.*?), in <a', file_content)
if (date == None):
print("Nu am gasit --- date --- in fisierul --- {} ---.".format(file_path))
else:
date = date.group(1)
pdf.set_text_color(0, 102, 204) # albastru
pdf.set_font('Kanit', size=8, style="B")
pdf.cell(txt=date)
pdf.ln()
pdf.ln()
pdf.ln()
pdf.ln()
pdf.set_text_color(0, 0, 0) # negru (default)
pdf.set_font('Kanit', size=12)
# extras text
articol = re.search('<!-- ARTICOL START -->([\s\S]*?)<!-- ARTICOL FINAL -->', file_content)
if (articol == None):
print("Nu am gasit --- ARTICOL START/FINAL --- in fisierul --- {} ---.".format(file_path))
else:
articol = articol.group(1)
articol = articol.replace(""", "\"")
articol = articol.replace("’", "'")
# paragraphs
par_regex = re.compile('<p class="text_obisnuit.*?">.*?</p>')
pars = re.findall(par_regex, articol)
pars_text = list()
if (len(pars) == 0):
print("Nu am gasit -- paragrafe text_obisnuit -- in fisierul --- {} ---.".format(file_path))
else:
for i in range(0, len(pars)):
if ('<p class="text_obisnuit">' in pars[i]):
# identificam clasa text_obisnuit si preluam textul
content = re.findall('<p class="text_obisnuit">(.*?)</p>', pars[i])
if (len(content) == 0):
print("Nu am gasit text in paragraful {}, fisierul {}.".format(pars[i], file_path))
else:
# punem textul intr-o celula multi_cell
for simbol in dict_simboluri.keys():
content[0] = content[0].replace(simbol, dict_simboluri[simbol])
pars_text.append(content[0])
# pdf.multi_cell(w=190, txt = content[0])
pdf.write_html(text=f'<p class="text_obisnuit">{content[0]}</p>')
# adaugam linie goala intre paragrafe
pdf.ln();
elif ('<p class="text_obisnuit2">' in pars[i]):
# identificam clasa text_obisnuit2 si preluam textul
content = re.findall('<p class="text_obisnuit2">(.*?)</p>', pars[i])
if (len(content) == 0):
print("Nu am gasit text in paragraful {}, fisierul {}.".format(pars[i], file_path))
else:
# setam fontul cu bold
pdf.set_font('Kanit', size=12, style="B")
# punem textul intr-o celula multi_cell
for simbol in dict_simboluri.keys():
content[0] = content[0].replace(simbol, dict_simboluri[simbol])
pars_text.append(content[0])
# pdf.multi_cell(w=190, txt = content[0])
pdf.write_html(text=f'<p class="text_obisnuit2">{content[0]}</p>')
# adaugam linie goala intre paragrafe
pdf.ln();
# resetam fontul
pdf.set_font('Kanit', size=12)
else:
continue
# adaugare link
pdf.ln()
pdf.ln()
pdf.set_font('Kanit', size=12, style="B")
pdf.cell(txt="Source:")
pdf.set_font('Kanit', size=12)
pdf.set_text_color(0, 102, 204) # albastru
pdf.cell(w=40, txt="https://neculaifantanaru.com/{}".format(file_name), link="https://neculaifantanaru.com/{}".format(file_name))
den_fisier = file_path.split('.')[0] + '.pdf'
pdf.output(den_fisier)
# break;
# functie care face merge la mai multe fisiere pdf
def merge_pdf_files(directory_path):
merger = PdfFileMerger()
for root, dirs, files in os.walk(directory_path):
for file_name in files:
if file_name.endswith(".pdf"):
print("PDF: ", file_name)
file_path = root + os.sep + file_name
merger.append(file_path)
merger.write(root + os.sep + "articles.pdf")
merger.close()
break;
save_to_pdf("c:\\Folder5\\")
merge_pdf_files("c:\\Folder5\\")
When you add a bold version of a font, you need to put also style="B"
, so try to change
pdf.add_font("Kanit", fname="fonts/Kanit-Bold.ttf")
to pdf.add_font("Kanit", style="B", fname="fonts/Kanit-Bold.ttf")
.
Also add pdf.add_page()
under pdf = PDF()
ALMOST PERFECT !!
Except one thing. The bold font does not stand out
In my python code, I setup <p class="text_obisnuit2">
as to be BOLD, but it sees only italic. Must be both, BOLD and ITALIC.
The bold font does not stand out, maybe because of the Kanit style font itself?
In html, the first line is like this:
<p class="text_obisnuit2"><em>Pentru a cunoaşte realitatea un lider trebuie să deţină şi arta disimulării – o armă de temut, dar eficientă în cele mai multe situaţii.</em></p
THE CODE VERSION 5 (almost perfect)
from fpdf import fpdf, html
import os
import re
from PyPDF2 import PdfFileMerger
def read_text_from_file(file_path):
"""
Aceasta functie returneaza continutul unui fisier.
file_path: calea catre fisierul din care vrei sa citesti
"""
with open(file_path, encoding='utf8', errors='ignore') as f:
text = f.read()
f.close()
return text
def write_to_file(text, file_path):
"""
Aceasta functie scrie un text intr-un fisier.
text: textul pe care vrei sa il scrii
file_path: calea catre fisierul in care vrei sa scrii
"""
with open(file_path, 'wb') as f:
f.write(text.encode('utf8', 'ignore'))
f.close()
dict_simboluri = dict()
dict_simboluri['ă'] = 'ă'
dict_simboluri['â'] = 'â'
dict_simboluri['ã'] = 'ã'
dict_simboluri['â'] = 'â'
dict_simboluri['ă'] = 'ă'
dict_simboluri['â'] = 'a'
dict_simboluri[' '] = ' '
dict_simboluri['î'] = 'î'
dict_simboluri['Î'] = 'Î'
dict_simboluri['î'] = 'î'
dict_simboluri['î'] = 'î'
dict_simboluri['Î'] = 'Î'
dict_simboluri['Î'] = 'Î'
dict_simboluri['î'] = 'î'
dict_simboluri['Î'] = 'i'
dict_simboluri['Î'] = 'Î'
dict_simboluri[' '] = ' '
dict_simboluri['ș'] = 'ș'
dict_simboluri['Ș'] = 'Ș'
dict_simboluri['Ş'] = 'Ş'
dict_simboluri['ș'] = 'ș'
dict_simboluri['ş'] = 'ș'
dict_simboluri['&'] = ''
dict_simboluri['ț'] = 'ț'
dict_simboluri['ţ'] = 'ț'
dict_simboluri['Ţ'] = 'Ţ'
dict_simboluri['ț'] = 'ț'
def save_to_pdf(directory_path):
for root, dirs, files in os.walk(directory_path):
for file_name in files:
if file_name.endswith(".html"):
file_path = root + os.sep + file_name
file_content = read_text_from_file(file_path)
# creare fisier PDF
class PDF(fpdf.FPDF, html.HTMLMixin):
pass
pdf = PDF()
pdf.add_page()
pdf.add_font("Kanit", fname="fonts/Kanit-Regular.ttf")
pdf.add_font("Kanit", style="B", fname="fonts/Kanit-Bold.ttf")
pdf.add_font("Kanit", style="I", fname="fonts/Kanit-Italic.ttf")
pdf.set_font("Kanit", size=24)
# extras denumire articol
den_articol = re.search('<td><h1 class="den_articol" itemprop="name">(.*?)</h1></td>', file_content)
if (den_articol == None):
print("Nu am gasit --- denumire articol --- in fisierul --- {} ---.".format(file_path))
else:
den_articol = den_articol.group(1)
for simbol in dict_simboluri.keys():
den_articol = den_articol.replace(simbol, dict_simboluri[simbol])
pdf.set_text_color(204, 0, 0) # rosu
pdf.set_font('Kanit', size=14, style="B")
pdf.multi_cell(w=190, txt=den_articol)
pdf.ln()
pdf.set_font('Kanit', size=12)
# extras data
date = re.search('<td class="text_dreapta">(.*?), in <a', file_content)
if (date == None):
print("Nu am gasit --- date --- in fisierul --- {} ---.".format(file_path))
else:
date = date.group(1)
pdf.set_text_color(0, 102, 204) # albastru
pdf.set_font('Kanit', size=8, style="B")
pdf.cell(txt=date)
pdf.ln()
pdf.ln()
pdf.ln()
pdf.ln()
pdf.set_text_color(0, 0, 0) # negru (default)
pdf.set_font('Kanit', size=12)
# extras text
articol = re.search('<!-- ARTICOL START -->([\s\S]*?)<!-- ARTICOL FINAL -->', file_content)
if (articol == None):
print("Nu am gasit --- ARTICOL START/FINAL --- in fisierul --- {} ---.".format(file_path))
else:
articol = articol.group(1)
articol = articol.replace(""", "\"")
articol = articol.replace("’", "'")
# paragraphs
par_regex = re.compile('<p class="text_obisnuit.*?">.*?</p>')
pars = re.findall(par_regex, articol)
pars_text = list()
if (len(pars) == 0):
print("Nu am gasit -- paragrafe text_obisnuit -- in fisierul --- {} ---.".format(file_path))
else:
for i in range(0, len(pars)):
if ('<p class="text_obisnuit">' in pars[i]):
# identificam clasa text_obisnuit si preluam textul
content = re.findall('<p class="text_obisnuit">(.*?)</p>', pars[i])
if (len(content) == 0):
print("Nu am gasit text in paragraful {}, fisierul {}.".format(pars[i], file_path))
else:
# punem textul intr-o celula multi_cell
for simbol in dict_simboluri.keys():
content[0] = content[0].replace(simbol, dict_simboluri[simbol])
pars_text.append(content[0])
# pdf.multi_cell(w=190, txt = content[0])
pdf.write_html(text=f'<p class="text_obisnuit">{content[0]}</p>')
# adaugam linie goala intre paragrafe
pdf.ln();
elif ('<p class="text_obisnuit2">' in pars[i]):
# identificam clasa text_obisnuit2 si preluam textul
content = re.findall('<p class="text_obisnuit2">(.*?)</p>', pars[i])
if (len(content) == 0):
print("Nu am gasit text in paragraful {}, fisierul {}.".format(pars[i], file_path))
else:
# setam fontul cu bold
pdf.set_font('Kanit', size=12, style="B")
# punem textul intr-o celula multi_cell
for simbol in dict_simboluri.keys():
content[0] = content[0].replace(simbol, dict_simboluri[simbol])
pars_text.append(content[0])
# pdf.multi_cell(w=190, txt = content[0])
pdf.write_html(text=f'<p class="text_obisnuit2">{content[0]}</p>')
# adaugam linie goala intre paragrafe
pdf.ln();
# resetam fontul
pdf.set_font('Kanit', size=12)
else:
continue
# adaugare link
pdf.ln()
pdf.ln()
pdf.set_font('Kanit', size=12, style="B")
pdf.cell(txt="Source:")
pdf.set_font('Kanit', size=12)
pdf.set_text_color(0, 102, 204) # albastru
pdf.cell(w=40, txt="https://neculaifantanaru.com/{}".format(file_name), link="https://neculaifantanaru.com/{}".format(file_name))
den_fisier = file_path.split('.')[0] + '.pdf'
pdf.output(den_fisier)
# break;
# functie care face merge la mai multe fisiere pdf
def merge_pdf_files(directory_path):
merger = PdfFileMerger()
for root, dirs, files in os.walk(directory_path):
for file_name in files:
if file_name.endswith(".pdf"):
print("PDF: ", file_name)
file_path = root + os.sep + file_name
merger.append(file_path)
merger.write(root + os.sep + "articles.pdf")
merger.close()
break;
save_to_pdf("c:\\Folder5\\")
merge_pdf_files("c:\\Folder5\\")
If you want both Bold and Italic you need to add the corresponding font.
So add pdf.add_font("Kanit", style="BI", fname="fonts/Kanit-BoldItalic.ttf")
under pdf.add_font("Kanit", style="I", fname="fonts/Kanit-Italic.ttf")
Also it doesn't work that you set pdf.set_font('Kanit', size=12, style="B")
for making it bold, you need to add the html tag, e.g. adding the <b>...</b>
tag.
You could modify pdf.write_html(text=f'<p class="text_obisnuit2">{content[0]}</p>')
in pdf.write_html(text=f'<p class="text_obisnuit2"><b>{content[0]}</b></p>')
ok, works.
One more thing. I also have another kind of tag, into paragraph. I have a <span class="text_obisnuit2"></span>
into the paragraph starting with <p class="text_obisnuit"></p>
as below:
Example:
<p class="text_obisnuit"><span class="text_obisnuit2">My name is James:</span> and I want to go home by Night.</p>
Must look like this in PDF (My name is James with BOLD and the rest of words to be normal text):
My name is James: and I want to go home by Night.
Please tell me where, and how to change my code as to work?
Currently, as written in the documentation, fpdf2
doesn't support CSS, so in this case you may want to replace <span class="text_obisnuit2"></span>
with <b>...</b>
.
For example you could use file_content = re.sub('<span class="text_obisnuit2">(.*)</span>', '<b>\g<1></b>', file_content)
to do the replacement in the entire html file.
Please tell me where, and how to change my code as to work?
I would put that line before everything else, just after opening the file, because I view it as pre-processing the file before using it with fpdf2
.
Brilliant. Thanks.
I made a short tutorial with my code, that you helped me finnish it. Thanks for your help.
Maybe some one needs a complete code for fpdf library.
Thank you for sharing your tutorial @me-suzy And thank you very much @RedShy for assisting here
I'm closing this issue now as things seem resolved
<p class="text_obisnuit">Intr-un articol precedent, <a href="https://neculaifantanaru.com/dupa-toate-regulile-artei.html"><em>Dupa toate regulile artei</em></a>, v-am povestit despre tanarul print Hamlet
shoult look like this in PDF
Intr-un articol precedent, Dupa toate regulile artei, v-am povestit despre tanarul print Hamlet Instead of that, this is how it looks in PDF (also, in PDF, as you se below, the signs of `href=https` disappeared `://` ![image](https://user-images.githubusercontent.com/2770489/186769126-d514e2ce-dc84-4974-be84-c7ad26d2a79e.png)