michaelrsweet / htmldoc

HTML Conversion Software
https://www.msweet.org/htmldoc
GNU General Public License v2.0
208 stars 47 forks source link

table handling error #437

Closed BobPekarske closed 3 years ago

BobPekarske commented 3 years ago

I have been using 1.8.29 forever to convert html from my website to pdf for printing. When I tried to migrate to a newer server having 1.9.7 (I think) htmldoc handled tables very differently. I have an image and br and a title under the image withing the td of the table and then I end the table. When the table ends below about mid-page, the title line is pushed to the top of the next page in the pdf. This happens with single and double column tables. I reduced the size of the image to make sure there was plenty of room on the page, but the caption still kicked to the next page. This seems wrong to me!

michaelrsweet commented 3 years ago

@BobPekarske There have been some bug fixes since 1.9.7 - is it possible for you to test with the current (1.9.12) release?

Also, it would be helpful for you to attach a ZIP/tar file with sample HTML and image file that reproduces the problem...

Thanks!

BobPekarske commented 3 years ago

Rather than send the files that caused the error report, I re-ran the script and now cannot reproduce the result.

I am so very sorry to have wasted your time.

I very much appreciate your timely response!

I am struggling with the upgrade from 1.8.29. Can you recommend a forum or other method by which I could get assistance with my issues? (Mostly seem to be font related.)

Live Long and Prosper,

Bob Pekarske

On 6/17/21 4:26 AM, Michael R Sweet wrote:

@BobPekarske https://github.com/BobPekarske There have been some bug fixes since 1.9.7 - is it possible for you to test with the current (1.9.12) release?

Also, it would be helpful for you to attach a ZIP/tar file with sample HTML and image file that reproduces the problem...

Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/michaelrsweet/htmldoc/issues/437#issuecomment-863159120, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUP2HMQNVIEONAK7SSCHPMTTTHLXRANCNFSM46XYBHGA.

-- "Every day above ground is a good day!" "Any day on the Arizona Trail is a great day!" www.azpotlatch.org

michaelrsweet commented 3 years ago

@BobPekarske Right now there isn't a forum for HTMLDOC per-se. I've enabled the "Discussions" feature here on Github to serve as a sort-of mailing list for HTMLDOC - only time will tell whether it will be widely used...

BobPekarske commented 2 years ago

I have been using htmldoc to convert my web pages to .pdf files to be printed.

Within the last few months I am seeing a change in results and I do not understand why.

I would be very grateful if you could suggest any reason.

Table elements that contain an image, a
, and a line of caption text, are now often being broken across a page boundary!!!

There is room on the first page for the caption, but it appears instead on the top of the following page.

Attached please find an example of the .html and the .pdf for that .html.

Thank you for any guidance!

Bob Pekarske

-- "Every day above ground is a good day!" "Any day on the Arizona Trail is a great day!" www.azpotlatch.org

BobPekarske commented 2 years ago

Please forgive me if I am not presenting this request properly.

You have helped me in the past and I am using one of those replies as the basis of this request.

If you require some other method of posting this request - please advise.

I am a long time user of htmldoc for publishing my genealogy data books and their periodic updates.

I am NOT a student of the software but I know that converting "scrolling based" html to "page based" pdf must involve a high level decision regarding where to insert page breaks.

Somehow, in the past several months those page breaks have moved in my publishing.

It may have been something I have done, but I could use some advice regarding what that might have been!

= = = = = previous message starts here = = = = =

I have been using htmldoc to convert my web pages to .pdf files to be printed.

Within the last few months I am seeing a change in results and I do not understand why.

I would be very grateful if you could suggest any reason.

Table elements that contain an image, a
, and a line of caption text, are now often being broken across a page boundary!!!

There is room on the first page for the caption, but it appears instead on the top of the following page.

Attached please find an example of the .html and the .pdf for that .html.

Thank you for any guidance!

Bob Pekarske

-- "Every day above ground is a good day!" "Any day on the Arizona Trail is a great day!" www.azpotlatch.org

michaelrsweet commented 2 years ago

@BobPekarske The best thing you can do is, using the current release of the software, reproduce with a specific HTML file and file a bug with that file, the options you used to generate it, and the specific issue you are seeing. Sometimes that is the only way I have a chance of addressing formatting issues since, as you note, HTML is not specifically a page-based description language and HTMLDOC does need to make formatting/page break decisions on the fly...

BobPekarske commented 11 months ago

Hello Michael Sweet,

Can you suggest some way in which I can learn more about how htmldoc uses fonts?

I am a good computer scientist, but ignorant of how font systems work.

I use htmldoc to produce hardcopy collections of genealogy web pages.

I have three different compute engines (of different vintages) accessing the same web server and using the same publishing program.

I get three different results on those pages where htmldoc calls for monospaced/Courier fonts.

One is not even monospaced, one is very light, and one is as expected.

Any guidance will be greatly appreciated.

Live Long and Prosper,

Bob Pekarske

Message ID: @.***>

-- "Every day above ground is a good day!" "Any day on the Arizona Trail is a great day!" www.azpotlatch.org "Vote Spock Party!" www.spockparty.org

BobPekarske commented 11 months ago

The reason I am not able to provide debug info using the latest 1.9 version is that I am struggling with two distinct problems.

With the newer versions, I have a problem with image captions being pushed to the following page when there is plenty of room on the current page.

This does not occur with 1.8.29 - so I have installed 1.8.29 on all my compute engines.

I want to understand font handling in general, so I can get all three compute engines to get the same result as the one that works.

Message ID: @.***>

-- "Every day above ground is a good day!" "Any day on the Arizona Trail is a great day!" www.azpotlatch.org "Vote Spock Party!" www.spockparty.org

BobPekarske commented 11 months ago

Message ID: @.***>

the attached sceen dumps show the same html file converted via htmldoc on three different systems (shadowfax, beso2, and beso4).

The script is:

CHARTARGS="--webpage --quiet --linkstyle plain --no-links --bodyfont

Monospace --embedfonts --fontsize 10 --bottom 5 --top 5 --header \"...\" --footer \"...\" --left 36" CHARTARGS="--webpage --quiet --linkstyle plain --no-links --bodyfont Courier --embedfonts --fontsize 10 --bottom 5 --top 5 --header \"...\" --footer \"...\" --left 36" htmldoc $CHARTARGS -f $(basename $0 .bash).$(hostname -s).pdf $(basename $0 .bash).html

One set uses the Monospace CHARTARGS, the other uses Courier.

All systems use html-1.8.29

--

"Every day above ground is a good day!" "Any day on the Arizona Trail is a great day!" www.azpotlatch.org "Vote Spock Party!" www.spockparty.org

michaelrsweet commented 11 months ago

@BobPekarske I really can't provide any support for 1.8.x - it is far too old and I do this for free...