paddymcall / SARIT-pdf-conversions

XML to PDF for SARIT texts
https://github.com/sarit/SARIT-corpus
2 stars 1 forks source link

Commands within words break hyphenation #23

Closed ppasedach closed 8 years ago

ppasedach commented 8 years ago

Such as in pramanavarttikavrtti.pdf p.12:

selection_078

Line 1112 of the tex file:

\pstart {\color{DodgerBlue3}“एतेन”} शब्दसामान्यमात्रस्यार्थशून्यस्याहेतुत्वकथनेन यत्का{\color{DodgerBlue3}“पिलादे”}र्ब्बुद्धिसुखा{\color{DodgerBlue3}“दीना\edlabel{pvv.15-3}\footnote{\label{pvv.15-3} ३ रूपादिवत् ।}म”}नित्यत्वोत्पत्तिमत्त्वादिहेतुतोऽचेतनत्वमिष्टं तथा दि ग म्ब रा णां चैतन्यं तरूणां सर्व्वस्याः {\color{DodgerBlue3}“त्वचोऽपोह”}तोऽपगते{\color{DodgerBlue3}“र्मर”}णादभिमतं {\color{DodgerBlue3}“तच्चिन्तितं”} वेदितव्यं (।) यथा ह्यप्रच्युतप्राच्यरूपस्य तिरोधानमनित्यत्वं सां\edlabel{pvv.15-4}\footnote{\label{pvv.15-4} ४ आसर्गप्रलयान्नित्यैका बुद्धिर्न वेदना, प्रकृतिर्भोग्या भोक्ता पुरुषः सांख्यस्य (।) सांख्यः स्वस्वभावाच्युतस्य तिरोधानमनित्यतामाहातिरोधानं बौद्धस्यासिद्धं । निरन्वयनाशः सांख्यस्य । आत्मनः सञ्चरन्तो वृक्षाद्यवस्था भवन्तीति क्षपणः । अनित्यता सामान्या सिद्धिर्व्विनिश्चयेस्ति ।}ख्य स्येष्टं बौ द्ध स्य तु निरन्वयविनाशित्वं । तस्य यथाक्रममुपादाने प्रतिवाद्यसिद्धता वाद्यसिद्धता च ।

Here I believe the \color is the culprit, but the \footnote just before it would have the same effect. One solution would be to avoid it as far as possible in the xml, in some places that might be possible, but surely not in all. A workaround might be possible, theoretically, but AFAIK doesn't exist yet.

paddymcall commented 8 years ago

This won't really work in LaTeX. See your question here: http://tex.stackexchange.com/questions/299896/edtext-breaks-hyphenation-of-a-word

Only real option is to write a preprocessor that converts the xml to something that latex can break (inserting break-allowed here or something).

ppasedach commented 8 years ago

I'm now hopeful this can be solved with relative ease in latex. I'll test it a bit and let you know.

paddymcall commented 8 years ago

please reopen when you have more feedback, or a proper latex solution. as it stands, i think this is nothing we can fix in either xslt or the xml sources.