tfbf / Bible-Punjabi-Pavitr-Bible-1945

Bible-Punjabi-Pavitr-Bible-1945
Other
5 stars 9 forks source link

Punctuation and spacing in Gurmukhi orthography ? #64

Open DavidHaslam opened 7 years ago

DavidHaslam commented 7 years ago

In the concatenated USFM file, there are:

NB. For the last observation, many of these are where an exclamation mark is present where a vertical line or danda might be expected.

These inconsistencies in spacing of punctuation become apparent when you compare the list with:

Each instance in both lists should be reviewed in the light of what are the standard rules for punctuation and spacing in Gurmukhi orthography.

DavidHaslam commented 7 years ago

In Unicode we are not limited to choosing between space or no space.

There are also code points for

NB. Many of the above special space characters are not supported by the Raavi font. The are supported by fonts Arial Unicode MS, Code2000 and FreeSans.

In some languages (e.g. French) there are typographical rules regarding the spacing of quotation marks. This makes particular sense when text contains nested quotations. In Gurmukhi, it may be worth considering making use of such a rule.

I have also observed that when there is no space between a punctuation mark and a Punjabi word, the punctuation mark becomes very small for some font engines. This is particularly a problem here in GitHub.

DavidHaslam commented 7 years ago

I have just updated my TextPipe filter to implement the following:

Insert a space after ending punctuation marks

Systematic changes restricted to within verse text:

NB. This enhancement does not address the issue of a space before such a punctuation mark.

DavidHaslam commented 7 years ago

After the above processing there are still:

Spacing rules need to be reviewed, especially for comma, question mark & exclamation mark. And some of the latter might still be a typo for a danda.

DavidHaslam commented 7 years ago

Yet to be analysed:

Space or no space after left quotation mark or before right quotation mark?

Action: Need the statistics for this, and to decide on what type of space (if any) is most appropriate.

DavidHaslam commented 7 years ago

Following this further, I just made a simple TextPipe filter to implement this:

Notes:

  1. Any existing spaces used immediately inside the quotation marks were removed.
  2. Afterwards there were 1435 thin space characters (U+2009) in the concatenated USFM file.
  3. I confirm that we must switch away from using the Raavi font.
  4. With FreeSans font the results look pleasant enough when viewed in BabelPad.
  5. There were some peculiar rendering issues in Notepad++ at certain locations, presumably down to the Scintilla editor upon which it's based.

Summary: It's a real improvement to have such consistency in the way that quotation marks are spaced.

DavidHaslam commented 7 years ago

This issue remains open after the merge of pull request #96

Further analysis and discussion is in order.