tfbf / Bible-Punjabi-Pavitr-Bible-1945

Bible-Punjabi-Pavitr-Bible-1945
Other
5 stars 9 forks source link

Multiple whitespace #20

Closed DavidHaslam closed 7 years ago

DavidHaslam commented 7 years ago

A search of the concatenated USFM files for the regxp pattern \x20{2,} gave 1678 matches.

Most of such instances of "multiple whitespace" is at the end of line position. However, there are some instances where it occurs "mid-verse". Some occurs in section headings between the \s tag and the text.

Many Unicode text editors have a facility to replace each instance of "multiple whitespace" by a single space. This should also get rid of the spurious 9 tab characters.

DavidHaslam commented 7 years ago

I have just fixed this systematically in my fork of master.

I have a TextPipe filter that can process all 66 files systematically.

See also issue #26

DavidHaslam commented 7 years ago

After the recent merge #81 and any others like it, before I make my concatenated USFM file, or convert the USFM files to OSIS XML, I generally run my TextPipe filter to implement this improvement locally.

DavidHaslam commented 7 years ago

This was fixed by merging pull request #96