TOC output problem: all headings with one word on each line

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Get a .chm with a TOC
2. Do a chm2pdf --webpage on it
3. TOC has linksto the right places but all whitespace has been replaced with 
newlines.

What is the expected output? What do you see instead?
The TOC should transfer as it is. Instead it transfers with all words in a 
heading on a separate 
line.

What version of the product are you using? On what operating system?
chm2pdf 0.9, OS is MacOS X 10.5.2 on Intel x86.

Please provide any additional information below.

 It is uncertain at this point which .chm files produce this broken TOC output. All the files I have 
been able to get my hands on break when converted.

Original issue reported on code.google.com by sugor...@gmail.com on 4 Apr 2008 at 8:56

GoogleCodeExporter commented 9 years ago

When you run chm2pdf, you will see the command-line invocation it uses for 
htmldoc.
This is usually a huge line with all filenames and options to be passed to 
htmldoc.
You could try to copy it and try it yourself. If it still produces such a TOC, 
then
it is probably a problem of htmldoc.

On the other side, chm2pdf runs quite a few corrections on the HTML files 
before it
passes them to htmldoc to get the PDF. Maybe one of those corrections causes the
problem. (The general procedure is: chm2pdf extracts the HTML files from the 
CHM and
puts them in /tmp/chm2pdf/orig/basename-of-file-to-convert. It corrects the HTML
files and puts the corrected versions in
/tmp/chm2pdf/work/basename-of-file-to-convert. It then passes all HTML files 
along
with the user-specified options etc. to htmldoc to get the PDF from the HTML 
collection).

Without an example .chm file it is hard to investigate further.

Original comment by chriskar...@googlemail.com on 5 May 2008 at 3:42

GoogleCodeExporter commented 9 years ago

I believe I've already tried that and the output was the same. I am not sure at 
this
point though as I am on Windows and rather busy.

What is rather strange about this is that there actually are two TOCs, one 
which is
correct, provided that the CHM had one as an extra page, and one which seems to 
be
generated -- I'm not sure what chm2pdf does behind the scenes. Looking into the 
files
int /tmp/chm2pdf/work/filename the HTML contained in the files defines both 
TOCs, I
haven't however taken a look into the orig/ folder to be honest.

This problem seems to manifest itself in all CHM files I have tried to convert, 
they
are however e-books bought online and I am not inclined to upload any of them 
as a
sample. I will try to obtain a file with no restrictions in distribution and 
upload
it in CHM form and the corresponding converted PDF.

Original comment by sugor...@gmail.com on 5 May 2008 at 4:40

GoogleCodeExporter commented 9 years ago

Please see if it works with htmldoc or not. If it's an htmldoc problem, there's
nothing we can do.

Original comment by devicera...@gmail.com on 5 May 2008 at 11:14

GoogleCodeExporter commented 9 years ago

You passed a CHM that contains errors/problems. See

http://www.karakas-online.de/forum/viewtopic.php?t=10965

To correct the problem, you should use the --extract-only and --dontextract 
options
(not together, but in sequence!) as outlined in 

http://www.karakas-online.de/forum/viewtopic.php?t=10969
http://www.karakas-online.de/forum/viewtopic.php?t=10275

to correct the CHM, then pass the corrected CHM to chm2pdf.

Original comment by chriskar...@googlemail.com on 25 Nov 2008 at 10:41

Changed state: Invalid

GoogleCodeExporter commented 9 years ago

I have been having the same problem with one of the eBooks I am trying to 
convert. I
did some digging and discovered an issue with HTMLDOC and not chm2pdf. However, 
there
is a way to fix this. 

Most toc.html files use html tables to hold the table of contents. What I have 
seen
though is that if the table that contains the "toc" does not specify the "width"
attribute, then HTMLDOC messes up the layout of rows and columns as described by
sugoruyo. 

To fix this find the offending <table> and make sure it has an attribute
width="100%". Use the --extract-only option and fix the html file in
/tmp/chm2pdf/work/<chm_file_name>/ directory. The run chm2pdf again with
--dontextract option and that will give you the corrected toc.

Original comment by gupta.pa...@gmail.com on 6 May 2010 at 4:15

shurain / chm2pdf

TOC output problem: all headings with one word on each line #9