n-t-roff / heirloom-doctools

The Heirloom Documentation Tools: troff, nroff, and related utilities
http://n-t-roff.github.io/heirloom/doctools.html
Other
127 stars 23 forks source link

Paragraph adjust "badness" calculation #22

Open reffort opened 9 years ago

reffort commented 9 years ago

I'd like to propose a change to the way the line breaking "badness" value is calculated in paragraph adjust mode. The current calculation occasionally produces a line or cluster of lines that have obviously loose spacing followed by one or more lines that have obviously tight spacing (or vice versa). That's the sort of thing that paragraph adjustment is supposed to minimize. The change I suggest corrects this behavior.

It is a minor change to line 1765 in function penalty() in n7.c.

The current line is:

t = t >= 0 ? t * 5 / 3 : -t;

The proposed new line is:

t = t >= 0 ? t * 2 : t * -2 ;

The current calculation is heavily skewed to favor really tight lines that have a space size at or near the value defined by the .minss request if the space size for the line being considered cannot be near the normal size. These small word spaces (which fall into the "very tight" class in TeX parlance) are actually favored more than word spaces that are on the looser side of the "normal" class. When the input text needs help the most, the badness curve acts much like a word processor and the result is uneven word spacing.

The factors I'm proposing produce badness values that are symmetrical about the normal space size and favor neither loose nor tight lines. The curve assigns a much higher penalty to tighter-than-normal spaces than does the current curve, and a modestly higher penalty to looser-than-normal spaces. The penalties assigned to looser and tighter space sizes are evenly balanced, and this significantly reduces the occurrence of obvious loose-tight lines.

n-t-roff commented 7 years ago

In the original repository there had not been any generated documents. They all had been accessed by the projects web page http://heirloom.sourceforge.net/doctools.html . This web page is now copied to http://n-t-roff.github.io/heirloom/doctools.html where now the PDFs can be accessed. I'd just like to be consistent here. It is better to not put them in the source repository, merging will anyway be challenging enough ;) I encourage you to fork https://github.com/n-t-roff/heirloom , put the PDFs in the doctools folder and do any changes you like on doctools.html.

BTW: Your first two commits can (and should) be merged already. (I am not sure if a pull request can be started for a selected commit, but I hope so.)

Changing .letadj's default unit from ens to ems may really make sense, but don't we have a compatibility issue then?

reffort commented 7 years ago

Yes, the maximum dynamic letter spacing applied to existing documents would be half of what they would have now. I don't know how widely this feature is used, which is the reason I didn't make any modifications to the units.

On 03/14/2017 03:26 PM, n-t-roff wrote:

In the original repository there had not been any generated documents. They all had been accessed by the projects web page http://heirloom.sourceforge.net/doctools.html . This web page is now copied to http://n-t-roff.github.io/heirloom/doctools.html where now the PDFs can be accessed. I'd just like to be consistent here. It is better to not put them in the source repository, merging will anyway be challenging enough ;) I encourage you to fork https://github.com/n-t-roff/heirloom , put the PDFs in the |doctools| folder and do any changes you like on |doctools.html|.

BTW: Your first two commits can (and should) be merged already. (I am not sure if a pull request can be started for a selected commit, but I hope so.)

Changing |.letadj|'s default unit from ens to ems may really make sense, but don't we have a compatibility issue then?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/n-t-roff/heirloom-doctools/issues/22#issuecomment-286548711, or mute the thread https://github.com/notifications/unsubscribe-auth/AMxHv6LOyhf4DiwmFnI-jdAQ60PoR2I8ks5rlvfpgaJpZM4FbPJg.

n-t-roff commented 7 years ago

Would it make sense to keep the old .letadj default unit when .letcalc is set to 0 and use ems when when .letcalc is not 0?

aksr commented 7 years ago

Why not create a new extension level (e. g. 4) for all the new incompatibilities? .do xflag 4

That way whenever in the future big changes (i. e. incompatibilities) happen, this number could be increased.

Not sure if it's feasible and/or if it complicates too much the code. (But it's more elegant IMHO.)

reffort commented 7 years ago

That is certainly possible. If there is a large base of existing documents that make use of .letadj, that would definitely be a very important consideration.

Another option that just came to mind would be to add a .letadj2 request, where the values would be specified as +/- percent of an em, for example, and then converted into the current internal units when the request is parsed. That wouldn't require any modifications, just the addition of the request.

On 03/23/2017 10:42 AM, n-t-roff wrote:

Would it make sense to keep the old |.letadj| default unit when |.letcalc| is set to 0 and use ems when when |.letcalc| is not 0?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/n-t-roff/heirloom-doctools/issues/22#issuecomment-288760959, or mute the thread https://github.com/notifications/unsubscribe-auth/AMxHv_-HWoWLHUj6KIoDq9GAQBr5zQNeks5ropLMgaJpZM4FbPJg.

n-t-roff commented 7 years ago

xflag 4 would actually be redundant. There is already .wscalc 0 and .letcalc 0 which sets true Heirloom mode. Now having additionally xflag 4 may be inconsistent (user may not understand necessity) and hence confusing.

Also for .letadj we have no compatibility issue when keeping default unit ens when .letcalc is 0. If it is clear documented that the default unit changes with .letcalc > 0 it should be ok. On the other hand there is no other request where the default unit is mode dependent, so .letadj2 could make sense (although I would not prefer it). Breaking compatibility by generally changing .letadj (also for letcalc 0) is not a good idea IMHO.

aksr commented 7 years ago

xflag 4 was a suggestion for all the current (and the new) incompatibilities which could be introduced.

n-t-roff commented 7 years ago

Ok, makes sense.

But how do we handle the incompatibilities introduced in the last 2.5 years? If we now encapsulate them in xflag 4 (and use xflag 5 for @reffort's changes) then this is a incompatibility itself since the tool will change it's behavior at once.

And how do we handle bug fixes? Maybe they should also be enabled with higher xflag, since users may use workarounds to compensate bugs. But this means to duplicate code--keep the old code with the bug and a "copy" where the bug is fixed.

aksr commented 7 years ago

We ignore every incompatibility until this now. But why now? .letadj is a core heirloom request and its behavior is changed. As for the bugs—bugs aren't features, they should be used/treated like that. Or after every fixed bug there would need to be a way to reverse the fix. I doubt there are decades-old bugs. In short, bug workarounds shouldn't concern us.

We should see incompatibilities for what they are. Every intentional incompatibility usually improves on something. Fo example, xflag 0 is for the traditional troff. (not exactly but close) xflag 1 extensions (first improvement) ... xflag 4 new extensions and/or changes xflag 5 and etc. should only be used if some incompatibility appears in the future.

I like xflag because it could be used to handle future improvements and incompatibilities very easily.

I don't want this to became a rant. This is all I have to say about this. This was only a suggestion. What I said appeared to me to be agreeable with the nature of xflag request.

reffort commented 7 years ago

With an alternate .letadj2 request, though, there would be no incompatibilities at all: just select the version that corresponds to the units you want to use. Existing documents would use the current .letadj like they do now. The use of .letcalc 0 is not restricted to .wscalc 0, so having the meaning of the numbers change with the value of .letcalc would just add confusion and unpredictability because .letcalc and .wscalc can be changed at any time in the document.

Another request that would benefit from an alternate syntax is .track. The current version defines tracking for a range of point sizes, with the extremes expressed as n points (or other units, such as "m") of tracking at p point size. This is useful for defining tracking that automatically adjusts with the type size, but it is cumbersome for typical uses, such as spacing small caps, tweaking a paragraph, or adjusting the title page. An alternate syntax whose only arguments are the font and number of em units is easier to use and more communicative. I use a macro to do this; however, tracking is a basic functionality, so a macro is really the wrong way to implement it.

I doubt there are decades-old bugs.

I've run across several of them, actually. There are two (at least) in the sample user's manual I formatted with the new paragraphing functions. One of them is obvious on several pages, the other (on the first page) I deliberately hid; both are cases that were probably never addressed. There are also bugs in eqn and tbl that were probably introduced along with revisions made in the late 1980s.

bug-fix incompatibility

My view is that actual bugs should be fixed. The original description for this project included the phrase "for fixing bugs" or something similar; I'm all for that. Changing something because groff does it that way, not so much.

xflag nn

The things I added won't do anything at all unless they are specifically requested, so I'm bamboozled as to why that would warrant a new compatibility level. At least, that's how they are supposed to act; if they don't actually behave that way, something needs to be fixed.

aksr commented 7 years ago

The use of .letcalc 0 is not restricted to .wscalc 0, so having the meaning of the numbers change with the value of .letcalc would just add confusion and unpredictability because .letcalc and .wscalc can be changed at any time in the document.

This is why I was against change to .letadj if .letcalc is something other than 0.

As for .letadj2, at least is there a better name?

reffort commented 7 years ago

As for |.letadj2|, at least is there a better name?

Sure. Pick one.

For the tracking alternate, for example, I use .trackfu where 'f' is the font and 'u' stands for em units (1/1000 em).

reffort commented 7 years ago

I'd like to make a few changes in light of a few months' additional use. A couple of these might be a bit disruptive, so I thought I would put them out for comment first. None of these changes will affect the Heirloom (default) mode.

  1. .wscalc : Word space calculation method
  1. .letcalc : letter adjustment calculation method Remove all of the methods except for 0 (Heirloom) and 4 (distributed). Method 4 is easily the best and most versatile of the bunch. The additional methods were originally intended to allow the user more control over the page texture, but their performance just doesn't seem to warrant the code clutter. There also are several adjustments now that will modify texture when used with Method 4.
  1. .overrunpenalty p1 t1 (t2) Change the reference penalty p1 so it defines the penalty at the threshold distance t1 instead of half of t1 (the requested penalty value will be cut in half). With the curves I was using at first, it made sense to define the penalty at the halfway point because there was no way to mentally estimate the value of the penalty at any particular point. However, with the current curve the penalty is easy to estimate, because the multiplier is the inverse of the distance (half the distance = twice the penalty).

  2. When lastlinestretch is in effect, calculate a penalty for the last line if it will be stretched to full measure. This will occasionally cause a character to move down onto the last line.

  3. .adjpenalty (adjacent line penalty) : minor revision to TeX compatibility mode due to a misinterpretation of the TeX behavior for the first line of a paragraph. When in TeX mode, it will then behave similar to the other modes, but with the TeX thresholds.

  4. hypp p1 p2 p3 : Generalize p3 so it is applied to the penultimate line in the paragraph (the current behavior is to apply it to the last word of the paragraph). This is purely cosmetic: the paragraph just looks better when the last full line isn't hyphenated.

n-t-roff commented 7 years ago

To 1. c): I think there is no such thing as letter adjustment in TeX? So if this is not an issue if letter adjustment is not used I'd prefer to be as compatible as possible to TeX in this mode. Even if other modes you have introduced are superior to TeX, some users may like to use TeX mode anyways.

To 4.: What is the change in this case? The change may cause a (single?) character to move to the next line? How can this be acceptable? I think I didn't get what you mean here ;)

The other suggested changes are okay IMHO.

reffort commented 6 years ago

I replied to this by e-mail way back when, but the message apparently did not make it. I appended it from my Sent folder below.

The only significant difference between what I described and what I did is that the last line hyphenation penalty was added as a fourth argument to .hypp instead of replacing the third argument. The third argument (last word) still acts as it always did, and it takes precedence over the last line penalty. This allows one to penalize a hyphenated last word more than a hyphenated last line. It would be easy to add the last line penalty to the default Heirloom code, but I am reluctant to start bloating it.

The TeX curve with its peculiar penalties was retained as .wscalc 10. A modified TeX curve was added as .wscalc 12. It mimics the default TeX curve, but applies the line penalty and current-line hyphenation penalty the same as the normal curves, with no interactions or unnecessary squaring--the scales for the line penalty and current-line hyphenation penalty match all the other penalties.

User selected curves can now range from r^2 to r^9 (and the two-stage versions, too).

The documentation is at https://github.com/reffort/typo-tests-documents


1(c): Plain TeX, of course, does not have support for letter adjustment, but pdfTeX does. The use of letter adjustment in pdfTeX is optional, as it is here. In my opinion, the basic pdfTeX letter adjustment method doesn't look all that good, but it requires very little additional code.

4: Yep. Lastlinestretch stretches the last line a maximum of one en, which is a little larger than the width of one average character in most text fonts. This change just treats the last line the same as any other justified line in the special case where it is justified by the action of lastlinestretch. Common single-letter words in English are "a" and "I" (conceivably, a very short two-letter word might also fit), and there are words where a hyphenation point can be shifted by one character.

On 06/15/2017 06:07 AM, n-t-roff wrote:

To 1. c): I think there is no such thing as letter adjustment in TeX? So if this is not an issue if letter adjustment is not used I'd prefer to be as compatible as possible to TeX in this mode. Even if other modes you have introduced are superior to TeX, some users may like to use TeX mode anyways.

To 4.: What is the change in this case? The change may cause a (single?) character to move to the next line? How can this be acceptable? I think I didn't get what you mean here

The other suggested changes are okay IMHO.


n-t-roff commented 6 years ago

I would agree to your improvements. It may even work (hopefully), that you start a pull-request so that merging can be kept simple. (I'm surprised that it works because of the many changes to this repo...) Your GitHup repo must be in sync to your local data at first (better do a git push or at least git status at your machine before starting the pull-request).

reffort commented 6 years ago

It looks like I'm going to have to stop trying to reply via email, it just isn't working for me any more.

The only thing left on my list at this point is to put the variables into the environment. This would also be a good time to fix any problems that have been identified.

One question: Is troff supposed to run with a zero line length or should it abend? It appears to run for me, so if it's supposed to do that I'll need to add a few divide-by-zero checks.

n-t-roff commented 6 years ago

Crashes should be avoided if anyhow possible. If someone unintentionally sets the line length to zero by an error in a calculation, it would be useful to have an appropriate error message to debug the issue. If the tool crashes the user knows nothing (assuming most users would not start a debugger). So I'd prefer the divide-by-zero checks (preferably with a short error message that line length is zero).

reffort commented 6 years ago

It seems the minimum line length is limited to 0.1 inch, which accounts for all the behaviors I saw as well as sidestepping division problems.

It might be worth while to document this and other limits in the user's manual. At one point long ago, a maximum line length of "about 7.54 inches" was documented.

n-t-roff commented 6 years ago

We should only specify limits for which we know the exact technical cause. Of course "> 0" is a line length requirement--at least for fill mode (line length should not be used in "nofill" mode).

Where had the 7.54 inches been documented? It appears to small to me, since normal letter format (without offset and margin) has already a width of 8.5 inches--not to mention landscape mode.

reffort commented 6 years ago

I agree that known limits should be documented somewhere. The 0.1" minimum line length is one, but there are others: horizontal motion, for example, is a 20 bit number. With the high resolution PS driver that is about 14.5" or 37 cm--not very restrictive, but it is a limit.

Some of the preprocessors also have limits. tbl and eqn, for instance, support only 9 fonts. They are supposed to support 99, but the conversion, done in another century, is incomplete (this is a trivial fix, they just didn't finish the job). Troff can support up to 255 fonts, but unless eqn and tbl are modified to recognize when long names are in use, they will always be limited to 99 maximum. 99 fonts shouldn't pose a problem in the vast majority of cases, but it's something to be aware of.

That 7.54" maximum line length (actually ll + po) came from the 1987 Unix Programmer's Manual, Vol. 4; there's an old 1978 Troff Tutorial that quotes the same number. My reason for mentioning it was not to say that it is a currently valid limit, but to show that at one time some effort was made to document the program's limits at the user level.

n-t-roff commented 6 years ago

It would be good to have these limits documented. Should be better to have a separate chapter for this to avoid redundantly mentioning the same limits for similar requests.

For the preprocessors there are currently just the manpages to document these values.

As a first step the maximum of 9 fonts could be documented and modification is remembered in separate issues.

Would you like to do these modifications to doc/troff/doc.tr, eqn/eqn.d/eqn.1 and tbl/tbl.1?

reffort commented 6 years ago

Collecting the limits separately is a good idea. That would allow documenting the source of each limit, too, as they are discovered.

For tbl and eqn, I think it would be easier to just fix them and skip documenting the 9 font maximum. If I'm the only one who has complained about it in the last 30 years, there aren't going to be very many people who have even noticed it. On the other hand, 99 fonts is the intended limit, so that should be documented.

But I don't think that man pages are necessarily the right place to document something like that. Man pages are very useful for getting command-line options and so forth, but when they try to become a detailed instruction book they are not very useful any more (at least that's how it is for me).

It probably would not take too much effort to include updated documentation for the preprocessors in the user's manual. That would allow better organization, presentation, and more detail. Adapted man pages might make a convenient starting point.

n-t-roff commented 6 years ago

The preprocessors (especially the standard preprocessors) can be documented in the user's manual. For minimal initial effort the preprocessor limits can be put in the new limits chapter of the user's manual too. So the man pages need not to be changed at first.

reffort commented 6 years ago

The test repository has been updated to move the typo variables to the troff environment. This was the last thing on my list, although I have since noticed a few local variables that should be renamed so they are consistent with similar names, and some particulars of the .elpchar request need to be modified. Now that enough time has passed that I've forgotten how the code is supposed to work, I want to go through it to make sure I haven't made any unwarranted assumptions.

There were also some changes made to n-t-roff that affect this code, but I strongly disagree with them and I don't intend to support them; most fall into the category of random changes to variable types.

n-t-roff commented 6 years ago

Ah, now I understand the issue with the type changes. I have no idea why @bapt did choose size_t in this case. I'll ask him. These changes should be withdrawn indeed.

n-t-roff commented 6 years ago

It is likely that commit 9d3c9d451264d412cd570e1b84bd03ed9f70df6e (did introduce these type changes) will be reverted in the near future.

reffort commented 6 years ago

I'll hold off, then, on making any appreciable changes that would affect these.

reffort commented 5 years ago

Would I be correct to assume that there are no issues with any of the advanced paragraphing or letter adjustment code? If that is the case, should I add a section to the user's manual to document the new requests?

n-t-roff commented 5 years ago

For me there are no issues with the new features. I would appreciate it if you would update the user's manual respectively.

reffort commented 5 years ago

Should I add a new section or intermingle the new requests with the existing ones? I'm partial to a new section, but if it is added onto the end of the document there would be a lot of unrelated material in between the existing requests and the new ones.

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Sunday, October 21, 2018 1:53 PM, n-t-roff notifications@github.com wrote:

For me there are no issues with the new features. I would appreciate it if you would update the user's manual respectively.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

n-t-roff commented 5 years ago

The usual way had been to insert new requests into their respective sections. But we could as well add a new section. The end of the document is not a good place of course. How about inserting the new section after (or before) a related section, i.e. section 4 "Text Filling, Adjusting and Centering"? Then users searching for this topic will also find the new features.

reffort commented 5 years ago

OK. I'll add the request details at the end of section 4, and a short overview paragraph in the section 4 text.

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Tuesday, October 23, 2018 2:44 PM, n-t-roff notifications@github.com wrote:

The usual way had been to insert new requests into their respective sections. But we could as well add a new section. The end of the document is not a good place of course. How about inserting the new section after (or before) a related section, i.e. section 4 "Text Filling, Adjusting and Centering"? Then users searching for this topic will also find the new features.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

reffort commented 5 years ago

The updated User's Manual is (at long last) ready to review. There were several unrelated issues with the document I wanted to resolve (or at least understand), and spent way too much time digging into their history.

There are three phases of updates: (1) Updates to add the new requests, (2) A number of other revisions, and (3) a few more tweaks.

The other revisions contain fixes for a couple of significant errors in the document that probably wouldn't bother users already familiar with the workings, but would likely cause new users some frustration until they found the correct information elsewhere. Numerous typos were also corrected. A number of minor issues still remain. The user's manual could still benefit from the services of a good editor with an endless supply of red ink pens.

The repository is located at https://github.com/reffort/troffmanual

The pertinent files are:

In the groff_char(7) character list, the Zapf Dingbats font (ZD) seems to be a source of some confusion. It is one of the standard fonts, but I could not find a readily-available source of complete information, so I made one and attached it to the changes.pdf document. Some of the glyphs could be useful; this list could be appended to the user's manual if desired. One significant point of confusion is that all but one of the special characters attributed to the ZD font by the groff_char(7) man page are not actually sourced from the ZD font.

A note on changes.pdf, page 4: The GhostScript installation on my machine does not handle the .notdef character correctly. It displays correctly on every viewer I tried, but when printed using GhostScript, the glyph prints with zero width. GhostScript also fails to embed the glyph in the PDF. (The PDF was made with Acrobat Pro, and it prints correctly using Acrobat and Windows.)

n-t-roff commented 5 years ago

Looks good to me, excellent work. Thank you also for the corrections of older errors in the document.

reffort commented 5 years ago

Thanks for taking the time to review that and provide feedback. Do you have any preference for the best way to handle those links in the Predefined Number Registers section (pages 15--17)? They should probably all be the same, whichever way works best.

n-t-roff commented 5 years ago

What do you mean by "all be the same"?

reffort commented 5 years ago

For most of the rows in the list of escape sequences and number registers, the three links for the section, escape/register name, and the description point to the same anchor, which is the detailed description for that escape/register name.

The lines I added or modified do not have the two redundant links (section and description) that the existing entries do. I did that primarily to demonstrate the different appearance: it is easier to read without all the redundant links (that's my opinion, anyway). To be consistent, all entries in the lists should either have the redundant links or none of them should have the redundant links.

In comparison, the list of requests, pages 6-10, does not have links for the explanations, only the request name. (The section link points to the beginning of the section text.)

n-t-roff commented 5 years ago

I agree to remove the redundant links. Only the identifier name column (escape or register name) should have links.

reffort commented 5 years ago

These changes are done and the updated document is now available in the same repo (http://github.com/reffort/troffmanual).

As before, troffdoc.pdf was made directly from the source.

The marked-up document is troffdoc-20190222-an.pdf. Changes from the January version are highlighted and annotated instead of being flagged with gdiffmk.

The January markup now has the changed areas highlighted so they are easier to find. It has been renamed troffdoc-20190127-mk-an.pdf.

changes.pdf has been updated to include the February changes.


There is one item that no longer makes make sense to me (assuming it did at one time) and probably needs to be changed in the code:

If there are no objections, I'll make that change and update the user's manual to reflect it.

I'm also not happy with the names for the read-only number registers .adjlapenalty and .adjlathreshold, so would appreciate any suggestions for new names. The names should somewhat resemble .adjpenalty and .adjthreshold (their word-space counterparts), and not be confused with .letpen and .letthresh (which do different things). The only alternate names I've come up with so far are .adjletpenalty and .adjletthreshold; they may be the best choice, but I would like some other viewpoints because I fear the names are getting to be as long as GNU command-line options.

n-t-roff commented 5 years ago

I have no objection with the suggested changes. The new names are ok. (If the length is the issue--how about an abbreviation like .adjletpen, .adjletthr or .adjletthld?)