Closed funderburkjim closed 1 year ago
Now I understand why smaller corrections are preferred here- for ease of tracking by the concerned!
Let's see how it goes.
Let's see how it goes.
Gave additional rights.
My procedure was to add @Andhrabharati to the Corrections team, and then in MWS to give write privileges to the Corrections team.
My procedure was to add @Andhrabharati to the Corrections team, and then in MWS to give write privileges to the Corrections team.
Mine was more radical. Hope it works.
You could undo yours, and then we can see if mine works.
You could undo yours, and then we can see if mine works.
Undone mine. Made Andhrabharati a member.
Let's see how it goes.
Gave additional rights.
are there any 'secret-teams' here? just curious!!
You can see this at https://github.com/orgs/sanskrit-lexicon/teams. This shows 'secret' beside 'owners' and 'researchers' -- I don't know significance of 'secret'.
You can see the corrections team at https://github.com/orgs/sanskrit-lexicon/teams.
I've used the corrections team recently for working with @AnnaRybakovaT and @sanskritisampada and now @Andhrabharati .
This corrections team seems to be useful so far, hope you also find it useful.
I don't know significance of 'secret'.
I started looking at the data all over again today.
As a first step, started making the file in Excel format facilitating tracing the issues very easily and effectively.. [I shall be posting them one by one, if it's of any interest.]
§1. While trying to rearrange the records to tally the no. of lines with <L>
count (287443 of them now), seen that
(a) 14 blank lines are in the file.
(b) about 2500 Lines starting with a space etc. at 200+ entries (listed below).
Other than splitting at a semicolon, no other logic is seen in these; these need to be looked at to combine or split appropriately.
Some of these are "candidates" for compound words, some are just explanatory material to be within ( ).
Some of these are dhātu or verbal type, and could have been split with <div>
tag as done at many others of that category.
For ease of locating these entries, I have clubbed the lines with $ marking.
<L>25130<pc>144,2<k1>āprī<k2>ā-prī́<e>2A
<L>25890<pc>148,3<k1>āyurveda<k2>āyur—veda<e>3
<L>28747<pc>165,1<k1>iti<k2>íti<h>2<e>1
<L>28801<pc>165,2<k1>idam<k2>idám<h>1<e>1
<L>29223<pc>167,2<k1>indriya<k2>indriyá<e>2B
<L>30689.2<pc>175,2<k1>uta<k2>utá<e>1A
<L>38737<pc>225,1<k1>ṛgveda<k2>ṛg—vedá<e>3
<L>38960<pc>226,2<k1>ṛbhu<k2>ṛbhu<e>1B
<L>40181.2<pc>232,2<k1>eva<k2>evá<e>1A
<L>40194<pc>232,2<k1>evam<k2>evám<e>1
<L>40704<pc>235,3<k1>om<k2>óm<e>1
<L>41227<pc>239,3<k1>aurva<k2>aurva<e>1A
<L>41336.1<pc>240,3<k1>ka<k2>ká<e>1A
<L>43073<pc>249,1<k1>kanyakubja<k2>kanya—kubja<e>3
<L>45277<pc>258,2<k1>karman<k2>kárman<e>1A
<L>45834<pc>261,1<k1>kalā<k2>kalā́<e>1A
<L>46708<pc>265,2<k1>kaśyapa<k2>kaśyápa<e>1B
<L>48295.6<pc>273,1<k1>kāmam<k2>kā́mam<e>2A
<L>48778<pc>275,3<k1>kārttikeya<k2>kārttikeya<e>2
<L>52133<pc>291,1<k1>kuntī<k2>kuntī<e>2
<L>54148<pc>300,3<k1>kṛ<k2>kṛ<h>1<e>1
<L>54860<pc>305,1<k1>kṛpa<k2>kṛ́pa<e>2B
<L>55154<pc>306,2<k1>kṛṣṇa<k2>kṛṣṇá<e>1B
<L>62631<pc>343,2<k1>gaṇeśa<k2>gaṇeśa<h>a<e>3
<L>63278<pc>346,1<k1>gandharva<k2>gandharvá<e>1
<L>63409<pc>346,3<k1>gam<k2>gam<h>1<e>1
<L>63787<pc>348,3<k1>garuḍa<k2>garuḍá<e>1
<L>64493<pc>352,1<k1>gā<k2>gā<h>1<e>1
<L>64621<pc>352,3<k1>gāyatrī<k2>gāyatrī́<e>2B
<L>66884<pc>363,3<k1>go<k2>gó<e>1
<L>68591<pc>371,2<k1>grah<k2>grah<e>1
<L>70283<pc>380,1<k1>ca<k2>ca<h>2<e>1
<L>71807<pc>387,2<k1>candravaṃśa<k2>candrá—vaṃśa<e>3
<L>72173<pc>389,1<k1>car<k2>car<e>1
<L>73297<pc>394,2<k1>ci<k2>ci<h>1<e>1
<L>74913.2<pc>401,3<k1>ced<k2>céd<e>1A
<L>75907<pc>406,2<k1>chid<k2>chid<h>1<e>1
<L>78427<pc>417,1<k1>jāgṛ<k2>jāgṛ<e>1
<L>80318<pc>425,2<k1>jñā<k2>jñā<h>1<e>1
<L>81602<pc>431,3<k1>takṣ<k2>takṣ<h>1<e>1
<L>82046<pc>434,1<k1>tad<k2>tád<e>2
<L>82226<pc>435,1<k1>tan<k2>tan<h>3<e>1
<L>84445<pc>444,3<k1>tārkṣya<k2>tā́rkṣya<e>2
<L>86817<pc>454,2<k1>tṝ<k2>tṝ<e>1
<L>88314<pc>460,3<k1>triśaṅku<k2>trí—śaṅku<e>3A
<L>88580<pc>461,3<k1>trita<k2>tritá<e>2
<L>89118<pc>464,1<k1>tvaṣṭṛ<k2>tváṣṭṛ<e>2A
<L>89331<pc>465,1<k1>dakṣa<k2>dákṣa<e>2B
<L>89992<pc>468,2<k1>dadhyac<k2>dadhy—ác<h>a<e>3
<L>91212<pc>473,2<k1>dā<k2>dā<h>1<e>1
<L>92231<pc>478,2<k1>div<k2>dív<h>3<e>2
<L>92592<pc>480,1<k1>diś<k2>diś<h>2<e>2
<L>94848<pc>489,2<k1>duh<k2>duh<h>2<e>1
<L>95263<pc>491,1<k1>dṛś<k2>dṛś<h>1<e>1
<L>99005<pc>507,1<k1>dvīpa<k2>dvīpá<e>1A
<L>100607<pc>513,2<k1>dhā<k2>dhā<h>1<e>1
<L>102621<pc>523,1<k1>na<k2>ná<h>2<e>1
<L>102830<pc>524,2<k1>nakṣatra<k2>nákṣatra<e>2B
<L>103414<pc>526,2<k1>nanu<k2>na-nú<h>b<e>1
<L>103825<pc>528,1<k1>nam<k2>nam<e>1
<L>111189<pc>565,1<k1>nī<k2>nī<h>2<e>1
<L>111758.3<pc>567,1<k1>nu<k2>nú<e>1A
<L>115215<pc>583,1<k1>pada<k2>padá<e>2A
<L>120795<pc>612,3<k1>pā<k2>pā<h>1<e>1
<L>125989<pc>635,1<k1>purāṇa<k2>purāṇá<e>2B
<L>126428<pc>637,1<k1>purūravas<k2>purū—rávas<e>3B
<L>142512<pc>720,1<k1>bandh<k2>bandh<e>1
<L>145575<pc>733,2<k1>buddha<k2>buddha<e>2B
<L>147326<pc>741,2<k1>brāhmaṇa<k2>brā́hmaṇa<e>2B
<L>147598<pc>743,1<k1>bhaj<k2>bhaj<e>1
<L>150969<pc>758,1<k1>bhī<k2>bhī<h>1<e>1
<L>151348<pc>759,2<k1>bhuj<k2>bhuj<h>3<e>1
<L>151456<pc>760,1<k1>bhū<k2>bhū<h>1<e>1
<L>152494<pc>764,3<k1>bhṛ<k2>bhṛ<h>1<e>1
<L>155187<pc>777,1<k1>math<k2>math<h>1<e>1
<L>155187.1<pc>777,1<k1>manth<k2>manth<h>a<e>1
<L>156626<pc>783,1<k1>man<k2>man<h>b<e>1
<L>156776<pc>783,3<k1>manas<k2>mánas<e>2
<L>156778<pc>783,3<k1>manas<k2>mánas<e>2A
<L>156872<pc>784,2<k1>manu<k2>mánu<e>2B
<L>158299<pc>790,2<k1>marut<k2>marút<e>1
<L>159183<pc>794,2<k1>maharṣi<k2>maha—rṣi<e>3
<L>161686.05<pc>804,1<k1>mā<k2>mā́<e>1A
<L>162243<pc>807,1<k1>mātṛ<k2>mātṛ́<e>1A
<L>164354<pc>817,1<k1>mithyā<k2>mithyā́<h>a<e>2
<L>167121<pc>829,2<k1>mṛj<k2>mṛj<h>1<e>1
<L>169114<pc>838,3<k1>yaj<k2>yaj<h>1<e>1
<L>169189<pc>839,2<k1>yajurveda<k2>yajur—vedá<e>3
<L>170138<pc>844,1<k1>yadā<k2>yadā́<h>a<e>2
<L>170148<pc>844,2<k1>yad<k2>yád<e>1
<L>170219<pc>844,3<k1>yadi<k2>yádi<e>2
<L>170258<pc>845,2<k1>yam<k2>yam<e>1
<L>170391<pc>846,1<k1>yama<k2>yáma<e>2B
<L>171018<pc>849,1<k1>yā<k2>yā<h>1<e>1
<L>171731<pc>853,2<k1>yuj<k2>yuj<h>1<e>1
<L>176054<pc>871,3<k1>rākṣasa<k2>rākṣasa<e>1B
<L>177272<pc>877,1<k1>rāma<k2>rāma<e>1B
<L>177657<pc>878,3<k1>rāmāyaṇa<k2>rāmāyaṇa<e>2B
<L>177723<pc>879,1<k1>rāvaṇa<k2>rāvaṇa<e>2B
<L>177899<pc>879,3<k1>rāhu<k2>rāhú<e>1
<L>178518<pc>883,1<k1>rudra<k2>rudrá<e>2B
<L>178755<pc>884,1<k1>rudh<k2>rudh<h>2<e>1
<L>180526<pc>892,3<k1>lakṣmī<k2>lakṣmī́<e>2A
<L>182034<pc>899,2<k1>lāsya<k2>lāsya<e>2
<L>183233<pc>906,1<k1>loka<k2>loká<e>2A
<L>183326<pc>906,3<k1>lokapāla<k2>loká—pālá<e>3
<L>184615<pc>912,1<k1>vac<k2>vac<e>1
<L>184795<pc>913,1<k1>vajra<k2>vájra<e>2
<L>186668<pc>921,2<k1>varuṇa<k2>váruṇa<h>a<e>2
<L>186836<pc>922,2<k1>varadarāja<k2>vará—dá—rāja<e>4
<L>188317<pc>928,3<k1>vallabhācārya<k2>vallabhācārya<e>3
<L>188675<pc>930,2<k1>vasiṣṭha<k2>vásiṣṭha<e>2B
<L>188709<pc>930,3<k1>vasu<k2>vásu<e>2A
<L>189238<pc>933,2<k1>vah<k2>vah<h>1<e>1
<L>189435<pc>934,2<k1>vā<k2>vā<h>1<e>1
<L>190382<pc>938,2<k1>vājasaneyisaṃhitā<k2>vājasaneyi—saṃhitā<e>3
<L>191229<pc>942,2<k1>vāyu<k2>vāyú<e>1A
<L>192212<pc>946,3<k1>vālmīki<k2>vālmīki<e>2
<L>193954<pc>954,2<k1>vikaraṇa<k2>vi-karaṇa<h>2<e>2
<L>194227<pc>955,3<k1>vikramāditya<k2>vikramāditya<h>b<e>2
<L>195626<pc>963,2<k1>vid<k2>vid<h>1<e>1
<L>195671<pc>963,3<k1>vidura<k2>vidura<e>2B
<L>195690<pc>963,3<k1>vidyā<k2>vidyā́<e>2
<L>195939<pc>964,3<k1>vid<k2>vid<h>3<e>1
<L>197343<pc>972,2<k1>vindhya<k2>vindhya<e>1
<L>198376<pc>978,2<k1>vibhīṣaṇa<k2>vi-bhī́ṣaṇa<e>2B
<L>199214<pc>982,3<k1>virāj<k2>vi-rā́j<e>2B
<L>200996<pc>992,2<k1>viśva<k2>víśva<e>1B
<L>201490<pc>994,2<k1>viśvakarman<k2>viśva-karman<e>2B
<L>201542<pc>994,3<k1>viśvāmitra<k2>viśvā́-mitra<h>b<e>2
<L>202398<pc>999,1<k1>viṣṇu<k2>víṣṇu<e>1
<L>203978<pc>1007,1<k1>vṛ<k2>vṛ<h>1<e>1
<L>204000<pc>1007,2<k1>vṛtra<k2>vṛtrá<e>2B
<L>204309<pc>1008,3<k1>vṛj<k2>vṛj<h>1<e>1
<L>204340<pc>1009,1<k1>vṛt<k2>vṛt<h>1b<e>1
<L>204573<pc>1010,2<k1>vṛdh<k2>vṛdh<e>1
<L>204736<pc>1011,1<k1>vṛddhi<k2>vṛddhi<e>2A
<L>205648<pc>1015,1<k1>veda<k2>veda<e>1A
<L>205937<pc>1016,3<k1>vedāṅga<k2>vedāṅga<h>b<e>2
<L>205944<pc>1017,1<k1>vedānta<k2>vedā́nta<e>2A
<L>206547<pc>1019,3<k1>vai<k2>vaí<h>2<e>1
<L>208013<pc>1027,2<k1>vaiṣṇava<k2>vaiṣṇavá<e>1B
<L>209343<pc>1035,2<k1>vyāsa<k2>vy-āsa<e>2A
<L>210905<pc>1044,2<k1>śakti<k2>śákti<e>2A
<L>211201<pc>1045,3<k1>śaka<k2>śaka<h>3<e>1
<L>211383<pc>1046,3<k1>śakuntalā<k2>śakuntalā<e>2
<L>211813<pc>1048,3<k1>śata<k2>śatá<e>1
<L>211989<pc>1049,2<k1>śatapathabrāhmaṇa<k2>śatá—patha—brāhmaṇa<e>4
<L>213186<pc>1055,1<k1>śaṃkarācārya<k2>śaṃkarācārya<h>b<e>2
<L>214958<pc>1062,3<k1>śākhā<k2>śā́khā<e>2A
<L>216290<pc>1068,3<k1>śās<k2>śās<h>1<e>1
<L>217079<pc>1072,2<k1>śiras<k2>śíras<e>1
<L>217501<pc>1074,1<k1>śiva<k2>śivá<e>1B
<L>218048<pc>1076,2<k1>śiśupāla<k2>śíśu—pāla<e>3
<L>218170<pc>1077,1<k1>śī<k2>śī<h>1<e>1
<L>219579<pc>1082,3<k1>śunaḥśepa<k2>śúnaḥ-śépa<e>3
<L>220889<pc>1088,3<k1>śeṣa<k2>śeṣa<e>1B
<L>221821<pc>1093,1<k1>śaunaka<k2>śaúnaka<e>1
<L>222707<pc>1097,3<k1>śrāddha<k2>śrāddha<e>1B
<L>222870<pc>1098,3<k1>śrī<k2>śrī<e>1B
<L>223393<pc>1100,3<k1>śru<k2>śru<h>1<e>1
<L>223543<pc>1101,3<k1>śruti<k2>śrúti<e>2A
<L>224940<pc>1108,1<k1>ṣaṣ<k2>ṣáṣ<e>1
<L>225180<pc>1109,2<k1>ṣaḍja<k2>ṣaḍ—ja<e>3
<L>225593<pc>1111,2<k1>sa<k2>sá<h>6<e>1
<L>227464<pc>1123,1<k1>saṃhitā<k2>saṃ-hitā<e>2A
<L>227913<pc>1125,1<k1>sagara<k2>sa-gara<e>1B
<L>232431<pc>1149,1<k1>saptan<k2>saptán<e>1
<L>232987<pc>1152,1<k1>sam<k2>sám<h>2<e>1
<L>234794<pc>1164,1<k1>samaya<k2>sam-ayá<e>2A
<L>237579<pc>1182,2<k1>sarasvatī<k2>sárasvatī<e>2A
<L>239474<pc>1190,2<k1>savitṛ<k2>savitṛ́<e>2A
<L>239967<pc>1192,3<k1>sah<k2>sah<h>1<e>1
<L>240499<pc>1195,2<k1>sahasra<k2>sahásra<h>b<e>1
<L>241225<pc>1199,1<k1>sāṃkhya<k2>sāṃkhya<e>1B
<L>241754<pc>1202,1<k1>sādhya<k2>sādhyá<e>2B
<L>244551<pc>1216,1<k1>siddhānta<k2>siddhānta<e>2A
<L>245039<pc>1218,2<k1>sītā<k2>sītā<e>2
<L>245241<pc>1219,2<k1>su<k2>su<h>3<e>1
<L>245331<pc>1219,3<k1>su<k2>sú<h>5<e>1
<L>250739<pc>1239,3<k1>sū<k2>sū<h>2<e>1
<L>251143<pc>1241,2<k1>sūta<k2>sūtá<h>3<e>1
<L>251193<pc>1241,3<k1>sūtra<k2>sū́tra<e>2A
<L>251460<pc>1243,1<k1>sūrya<k2>sū́rya<h>a<e>2
<L>251838<pc>1244,3<k1>sṛ<k2>sṛ<e>1
<L>251924<pc>1245,1<k1>sṛj<k2>sṛj<h>1<e>1
<L>252716<pc>1249,3<k1>soma<k2>sóma<h>1<e>1
<L>254632<pc>1259,1<k1>stu<k2>stu<h>1<e>1
<L>254842<pc>1260,1<k1>stṛ<k2>stṛ<h>1<e>1
<L>254842.1<pc>1260,1<k1>stṝ<k2>stṝ<h>1<e>1
<L>257773<pc>1275,1<k1>sva<k2>svá<h>1<e>1
<L>259531<pc>1283,1<k1>svastika<k2>svastika<e>2A
<L>259801<pc>1284,3<k1>svid<k2>svid<h>1<e>1
<L>259990<pc>1285,3<k1>svarita<k2>svarita<e>2B
<L>260297<pc>1287,2<k1>han<k2>han<h>1<e>1
<L>260490<pc>1288,1<k1>hanumat<k2>hanu-mat<h>b<e>2
<L>261235.1<pc>1290,3<k1>hariścandra<k2>hári—ścandra<e>3B
<L>263443<pc>1300,3<k1>hu<k2>hu<h>1<e>1
<L>264003<pc>1303,3<k1>hetu<k2>hetú<h>b<e>2
<L>264480<pc>1306,1<k1>hotṛ<k2>hótṛ<h>b<e>1
<L>264875<pc>1308,3<k1>hve<k2>hve<e>1
-----------------------------
§2. Three records have two '¦' in it.
<L>116599<pc>588,3<k1>parameśvarī<k2>parameśvarī<e>3B
¦ of <s1 slp1="sItA">Sītā</s1>, <ls>RāmatUp.</ls> (¦ <ab>N.</ab> of <ab>wk.</ab>)<info phwchild="116611.1"/><info lex="inh"/>
<LEND>
;; except the space after the opening brace ( N. of wk.), no other impact is seen in this.
<L>124485<pc>628,2<k1>piśācaka<k2>piśācaka<e>2B
¦ (sc. ¦ = <s>piśāca-bh°</s>, <ls>L.</ls><info lex="inh"/>
<LEND>
;; bhāṣā is missed after sc. and so is the closing brace. [(sc. bhāṣā) intending to say piśācikā bhāṣā = piśāca bhāṣā] ;; incidentally this "sc." (total count = 97) is not expanded; could it be the same as "scil."? (total count =560+)
<L>139159<pc>703,3<k1>prājāpatyā<k2>prājāpatyā<e>2B
¦ (with ¦ with `<s>śakaṭa</s>`, <ls>MW.</ls><info lex="inh"/>
<LEND>
;; two 'with's are shown here.
Pushed the Excel file to the MWS\mwtranscode folder.
Though @funderburkjim may not be directly able to use this Excel file, @gasyoun and @drdhaval2785 should be in a position to do so.
Though I am supposed to do the supplement portion for now, started with other issues in the data, taking that now I am part of the TEAM.
[The actual point is that I am currently on the proofing work; so it would take sometime to start posting changes in the Supplement portion]
§3. texts within <s>...</s>
<s>- 18508
-<s> 386
</s>- 6340
-</s> 2
to keep all these '-' within <s>...</s>
------------------------
<s>° 12171
°<s> 94
°</s> 9057
</s>° 130
to keep all these '°' within <s>...</s>
------------------------
</s>. 9619
.</s> 1
to remove the dot in this single case (and which is not in the print!)
------------------------
§4. Badly split lines
The following 45 records need to be re-looked at, either they are part of the prev. line (record), or a part of a group, or have a NULL string, ... ... In any case, they cannot be a new record.
<L>121 <pc>1,2 <k1>aṃhaspati <k2>áṃhas—pati <e>3
<L>3294 <pc>16,1 <k1>atisṛjya <k2>ati-sṛjya <e>2 A
<L>4074 <pc>20,1 <k1>adhonābham <k2>adho-nābham <e>3
<L>15905 <pc>90,3 <k1>arthadatta <k2>ártha—datta <e>3 A
<L>16664 <pc>94,3 <k1>alasa <k2>a-lasa <e>1 A
<L>19298 <pc>111,3 <k1>avyabhicāra <k2>a-vyabhicāra <e>1 A
<L>25651 <pc>147,3 <k1>āmra <k2>āmra <e>1 B
<L>27756 <pc>159,1 <k1>āśvatthika <k2>āśvatthika <e>2 A
<L>35779.1 <pc>206,3 <k1>upoḍha <k2>upoḍha <e>2 A
<L>46658 <pc>265,1 <k1>kaśika <k2>kaśika <e>1
<L>48456 <pc>274,1 <k1>kāya <k2>kāyá <e>1 A
<L>59492 <pc>328,3 <k1>kṣiti <k2>kṣití <h>a <e>3
<L>59553 <pc>329,1 <k1>kṣipra <k2>kṣiprá <e>2 A
<L>64772 <pc>353,2 <k1>gāna <k2>gāna <h>b <e>2
<L>64773 <pc>353,2 <k1>gāninī <k2>gāninī <h>b <e>2
<L>64774 <pc>353,2 <k1>gānīya <k2>gānīya <h>b <e>2
<L>75288 <pc>403,3 <k1>cyavāna <k2>cyávāna <e>2 A
<L>75682 <pc>405,2 <k1>chandya <k2>chándya <e>2 A
<L>81281 <pc>430,1 <k1>ṭuṇṭuka <k2>ṭuṇṭuka <e>1 B
<L>82052.1 <pc>434,1 <k1>tad <k2>tad <e>2 B
<L>84402 <pc>444,2 <k1>tāriṇītantra <k2>tāriṇī—tantra <e>3 A
<L>85069 <pc>447,2 <k1>tiraskāra <k2>tirás—kāra <e>3 A
<L>94437 <pc>487,2 <k1>durgataraṇī <k2>durgá—taraṇī <e>3
<L>94571.2 <pc>487,3 <k1>duṣkṛta <k2>dúṣ—kṛtá <e>3 B
<L>102655 <pc>523,2 <k1>namuca <k2>ná—muca <h>a <e>3
<L>106757 <pc>540,2 <k1>nirāmarṣa <k2>nir—āmarṣa <e>3
<L>116092 <pc>586,3 <k1>paratattva <k2>pára—tattva <e>3
<L>116197 <pc>587,1 <k1>parabhū <k2>pára—bhū <e>3
<L>116528 <pc>588,2 <k1>paramāgama <k2>paramāgama <e>3
<L>122469 <pc>619,3 <k1>pāraṇa <k2>pāraṇa <e>2 B
<L>122659 <pc>620,3 <k1>pārāvati <k2>pārāvati <e>2 A
<L>148702 <pc>747,2 <k1>bharaṇḍa <k2>bharaṇḍa <e>2 A
<L>149286 <pc>750,1 <k1>bhaṣaṇa <k2>bhaṣaṇa <e>2 B
<L>152428 <pc>764,2 <k1>bhūriśreṣṭhaka <k2>bhū́ri—śreṣṭhaka <e>3
<L>154721 <pc>775,1 <k1>maṇibandhana <k2>maṇí—bandhana <e>3 A
<L>154994 <pc>776,1 <k1>maṇḍalipattrikā <k2>maṇḍali-pattrikā <e>2 A
<L>164731 <pc>819,1 <k1>mīvara <k2>mīvara <e>1 B
<L>167489 <pc>831,2 <k1>mekhalā <k2>mékhalā <e>2 A
<L>178858 <pc>884,3 <k1>ropa <k2>ropa <h>a <e>2
<L>179010 <pc>885,3 <k1>roḍhṛ <k2>roḍhṛ <h>a <e>2
<L>183114 <pc>905,2 <k1>lulāpakāntā <k2>lulāpa—kāntā <e>3 A
<L>200997 <pc>992,3 <k1>viśva <k2>víśva <e>1 B
<L>209874 <pc>1038,1 <k1>vyāmana <k2>vyāmana <e>2 A
<L>214007 <pc>1058,2 <k1>śarman <k2>śárman <e>1 A
<L>257408 <pc>1273,1 <k1>syada <k2>syáda <e>2 A
So are these 159 records.
[Rule: a number within a <ls>
cannot be at the start of a new record, it goes with the previous one as a continuation.]
Noticed that in one of these entries n. (Neuter) is taken as ii. This particular one should be a new record, appropriately changed.
<L>18212 <pc>104,3 <k1>avasa <k2>avasá <e>2 A
<L>21051 <pc>121,1 <k1>asura <k2>ásura <e>2 B
<L>23290 <pc>134,1 <k1>āḍhya <k2>āḍhyá <e>1 A
<L>23368 <pc>134,2 <k1>ātapat <k2>ā-tápat <e>2 A
<L>23750 <pc>136,2 <k1>ādaṣṭa <k2>ā-daṣṭa <e>2 A
<L>29966 <pc>171,2 <k1>īṣā <k2>īṣā́ <e>1 A
<L>35754 <pc>206,3 <k1>upavāsa <k2>upa-vāsa <e>2 A
<L>38569 <pc>224,2 <k1>ṛtupati <k2>ṛtú—páti <e>3 A
<L>39058 <pc>227,1 <k1>ṛṣi <k2>ṛ́ṣi <e>1 A
<L>39376 <pc>228,2 <k1>ekanemi <k2>éka—nemi <e>3 A
<L>40455 <pc>234,1 <k1>aindrāgna <k2>aindrāgná <e>2 A
<L>42290 <pc>245,1 <k1>kaṇa <k2>káṇa <e>2 A
<L>44254 <pc>254,1 <k1>karaṇaprayoga <k2>káraṇa—prayoga <e>3 A
<L>46139 <pc>262,3 <k1>kalpavallī <k2>kálpa—vallī <e>3 A
<L>46142 <pc>262,3 <k1>kalpaviṭapin <k2>kálpa—viṭapin <e>3 A
<L>46917 <pc>266,2 <k1>kāṃsya <k2>kāṃsya <e>2 B
<L>47253 <pc>268,1 <k1>kākṣaseni <k2>kākṣaseni <e>1 A
<L>50359 <pc>283,1 <k1>kiṃkāryatā <k2>kiṃ—kārya-tā <e>3 A
<L>52192 <pc>291,2 <k1>kupya <k2>kupya <e>2 B
<L>54918 <pc>305,2 <k1>kṛpāṇikā <k2>kṛpāṇikā <e>2 B
<L>54919 <pc>305,2 <k1>kṛpāṇikā <k2>kṛpāṇikā <e>2 B
<L>55578 <pc>308,3 <k1>kḷpta <k2>kḷptá <e>2 A
<L>56136 <pc>311,1 <k1>keśari <k2>keśari <e>2 A
<L>57009 <pc>315,2 <k1>kauṭasākṣya <k2>kauṭa—sākṣya <e>3 A
<L>57121 <pc>316,1 <k1>kautukāgāra <k2>kautukāgāra <e>3 A
<L>57187 <pc>316,2 <k1>kaupīna <k2>kaupīna <e>2 A
<L>57458 <pc>318,1 <k1>kauśāmbī <k2>kauśāmbī <e>1 B
<L>58113 <pc>321,2 <k1>krayya <k2>kráyya <e>2 A
<L>58411 <pc>323,1 <k1>kreṅkāra <k2>kreṅ-kāra <e>1 A
<L>58485 <pc>323,2 <k1>krauñca <k2>krauñcá <e>1 B
<L>58815 <pc>325,1 <k1>kṣatra <k2>kṣatrá <e>1 A
<L>59371 <pc>328,1 <k1>kṣitīśvara <k2>kṣitīśvara <e>3 A
<L>59558 <pc>329,1 <k1>kṣipra <k2>kṣiprá <e>2 B
<L>59565 <pc>329,1 <k1>kṣipra <k2>kṣiprá <e>2 B
<L>59566 <pc>329,1 <k1>kṣipra <k2>kṣiprá <e>2 B
<L>59657 <pc>329,2 <k1>kṣībatā <k2>kṣība—tā <e>3 A
<L>59819 <pc>330,1 <k1>kṣīraudana <k2>kṣīraudaná <e>3 A
<L>59820 <pc>330,1 <k1>kṣīraudana <k2>kṣīraudaná <e>3 A
<L>60109 <pc>331,2 <k1>kṣudhālu <k2>kṣudhālu <e>2 A
<L>60187 <pc>331,3 <k1>kṣurapra <k2>kṣurá—pra <e>3 B
<L>60866 <pc>335,1 <k1>khacita <k2>khacita <e>2 A
<L>60972.25 <pc>335,2 <k1>khaṭvāṅga <k2>khaṭvā—ṅga <e>3 B
<L>61356 <pc>337,1 <k1>khara <k2>khára <e>1 B
<L>61647 <pc>338,2 <k1>khalina <k2>khalina <e>1 A
<L>62124 <pc>341,1 <k1>khyāti <k2>khyāti <e>2 A
<L>62237 <pc>341,3 <k1>gaṅgādvāra <k2>gáṅgā—dvāra <e>3 A
<L>62238 <pc>341,3 <k1>gaṅgādvāra <k2>gáṅgā—dvāra <e>3 A
<L>62385 <pc>342,2 <k1>gajayodhin <k2>gaja—yodhin <e>3 A
<L>62610.1 <pc>343,2 <k1>gaṇaśas <k2>gaṇá—śás <e>3 A
<L>62658 <pc>343,3 <k1>gaṇeśvara <k2>gaṇeśvara <e>3 A
<L>62793 <pc>344,1 <k1>gaṇḍalekhā <k2>gaṇḍa—lekhā <e>3 A
<L>63104 <pc>345,2 <k1>gandhamādana <k2>gandhá—mādana <e>3 A
<L>63105 <pc>345,2 <k1>gandhamādana <k2>gandhá—mādana <e>3 A
<L>63305 <pc>346,2 <k1>gandharvanagara <k2>gandharvá—nagara <e>3 A
<L>64368 <pc>351,2 <k1>gavāyus <k2>gavāyus <e>3 A
<L>64390 <pc>351,2 <k1>gavaya <k2>gavayá <e>2 A
<L>64502 <pc>352,1 <k1>gātuvid <k2>gātú—víd <e>3 A
<L>65345 <pc>356,2 <k1>guggula <k2>guggula <e>1 A
<L>65986 <pc>359,2 <k1>gūrti <k2>gūrtí <e>2 A
<L>66101 <pc>359,3 <k1>gurubha <k2>gurú—bha <e>3 A
<L>67909 <pc>368,2 <k1>gopānasī <k2>gopānasī <e>3 A
<L>69065 <pc>374,1 <k1>grāvastut <k2>grāva—stút <e>3 A
<L>69133 <pc>374,2 <k1>graiṣmika <k2>graiṣmika <e>2 B
<L>69913 <pc>378,2 <k1>ghoṣila <k2>ghoṣila <e>2 A
<L>70714 <pc>382,2 <k1>cakṣus <k2>cákṣus <e>2 A
<L>72117 <pc>388,3 <k1>camūṣad <k2>camū́—ṣád <e>3 A
<L>72653 <pc>391,2 <k1>calācala <k2>calācalá <e>2 A
<L>72817 <pc>392,1 <k1>cāturāśramya <k2>cāturāśramya <e>2 A
<L>72943 <pc>392,3 <k1>cāndrāyaṇa <k2>cāndrāyaṇa <e>2 B
<L>74114 <pc>398,1 <k1>cintanīya <k2>cintanīya <e>2 A
<L>74177 <pc>398,2 <k1>cipiṭaghrāṇa <k2>cipiṭa—ghrāṇa <e>3 A
<L>74194 <pc>398,2 <k1>cipya <k2>cipya <e>1 B
<L>74580 <pc>400,2 <k1>codaka <k2>codaka <e>2 B
<L>74747 <pc>401,1 <k1>cūḍākarman <k2>cūḍā—karman <e>3
<L>75243 <pc>403,2 <k1>caula <k2>caula <e>1 A
<L>75483 <pc>404,2 <k1>chadis <k2>chadís <e>2 A
<L>75710 <pc>405,2 <k1>chardis <k2>chardís <e>1 A
<L>75786 <pc>405,3 <k1>chāgaleya <k2>chāgaleya <e>2 B
<L>76020 <pc>407,1 <k1>cheda <k2>cheda <e>2 B
<L>76091 <pc>407,2 <k1>chorita <k2>chorita <e>2 A
<L>76576 <pc>409,2 <k1>jaṭāla <k2>jaṭāla <e>2 A
<L>76686 <pc>409,3 <k1>jaḍāśaya <k2>jaḍāśaya <e>3 A
<L>76972 <pc>411,1 <k1>janiman <k2>jániman <e>2 A
<L>76976 <pc>411,1 <k1>janiman <k2>jániman <e>2 A
<L>77161 <pc>412,1 <k1>japahoma <k2>jápa—homa <e>3 A
<L>77592 <pc>413,3 <k1>jayus <k2>jayús <e>2 A
<L>77594 <pc>413,3 <k1>jayya <k2>jáyya <e>2 A
<L>78234 <pc>416,1 <k1>jaleśvara <k2>jaleśvara <e>3 A
<L>78257 <pc>416,1 <k1>jalaukas <k2>jalaukas <e>3 B
<L>78419 <pc>416,3 <k1>jāṃhāgiranagara <k2>jāṃhāgira—nagara <e>3 A
<L>78734 <pc>418,2 <k1>jātīphala <k2>jātī—phala <e>3 A
<L>78878 <pc>419,1 <k1>jābālopaniṣad <k2>jābālopaniṣad <e>3 A
<L>78953 <pc>419,2 <k1>jāmbudvīpaka <k2>jāmbudvīpaka <e>1 A
<L>78954 <pc>419,2 <k1>jāmbudvīpaka <k2>jāmbudvīpaka <e>1 A
<L>79569 <pc>422,2 <k1>jīraka <k2>jīraka <e>3 A
<L>79649 <pc>422,3 <k1>jīvakośa <k2>jīvá—kośa <e>3 A
<L>79748 <pc>423,1 <k1>jīvaśarman <k2>jīvá—śarman <e>3 A
<L>80064 <pc>424,2 <k1>juhoti <k2>juhoti <e>2 A
<L>80080 <pc>424,2 <k1>jūjuvas <k2>jūjuvás <e>2 A
<L>80173 <pc>424,3 <k1>jeya <k2>jeya <e>2 A
<L>80180 <pc>424,3 <k1>jenyāvasu <k2>jenyā-vasu <e>3 A
<L>80678 <pc>427,2 <k1>jyotiḥśāstra <k2>jyotiḥ—śāstra <e>3 A
<L>80695 <pc>427,2 <k1>jyotirgarga <k2>jyotir—garga <e>3 A
<L>80701 <pc>427,2 <k1>jyotirnirbandha <k2>jyotir—nirbandha <e>3 A
<L>81269 <pc>430,1 <k1>ṭīṭibhī <k2>ṭīṭibhī <e>1 B
<L>82100 <pc>434,2 <k1>tadguṇa <k2>tád—guṇa <e>3 B
<L>82101 <pc>434,2 <k1>tadguṇa <k2>tád—guṇa <e>3 B
<L>82204 <pc>434,3 <k1>tanmātra <k2>tan—mātra <e>3 B
<L>82394 <pc>435,3 <k1>tanūpāna <k2>tanū́—pā́na <e>3 B
<L>82556 <pc>436,2 <k1>tantrin <k2>tantrin <e>2 B
<L>82703 <pc>437,1 <k1>tapas <k2>tápas <e>2 A
<L>82768 <pc>437,2 <k1>tapojā <k2>tapo—jā́ <e>3 A
<L>82946 <pc>438,2 <k1>tamisrapakṣa <k2>támisra—pakṣa <e>3 A
<L>83179 <pc>439,1 <k1>tarutṛ <k2>tarutṛ́ <e>2 A
<L>83264 <pc>439,2 <k1>tarucchāyā <k2>taru—cchāyā <e>3 A
<L>83535 <pc>440,3 <k1>talātala <k2>talātala <e>3 A
<L>83551 <pc>440,3 <k1>talin <k2>talin <e>2 A
<L>83622 <pc>441,1 <k1>taviṣa <k2>taviṣá <e>2 B
<L>83650 <pc>441,2 <k1>taṣṭṛ <k2>táṣṭṛ <e>2 A
<L>83716 <pc>441,3 <k1>tāḍāvacara <k2>tāḍāvacara <e>3 A
<L>84019 <pc>443,1 <k1>tāmasakīlaka <k2>tāmasa—kīlaka <e>3 A
<L>84324 <pc>444,1 <k1>tārakāmaya <k2>tārakā—maya <e>3 A
<L>84328 <pc>444,1 <k1>tārakārāja <k2>tārakā—rāja <e>3 A
<L>85398 <pc>448,3 <k1>tīkṣṇa <k2>tīkṣṇá <e>1 A
<L>85560 <pc>449,1 <k1>tīrṇapratijña <k2>tīrṇa—pratijña <e>3 A
<L>85731 <pc>449,3 <k1>tugākṣīrī <k2>tugā—kṣīrī <e>3 A
<L>85735 <pc>449,3 <k1>tugra <k2>túgra <e>1 A
<L>85819 <pc>450,1 <k1>tujya <k2>tújya <e>2 A
<L>85876 <pc>450,2 <k1>tutthaka <k2>tutthaka <e>2 A
<L>86045 <pc>451,1 <k1>turaṇyu <k2>turaṇyú <e>3 A
<L>86254 <pc>452,1 <k1>tuviṣvan <k2>tuví—ṣván <e>3 A
<L>87172 <pc>455,3 <k1>tomaragraha <k2>tomara—graha <e>3 A
<L>87248 <pc>456,1 <k1>toyikā <k2>toyikā <e>2 A
<L>87659 <pc>458,2 <k1>trigaṅga <k2>trí—gaṅga <e>3 A
<L>87911 <pc>459,1 <k1>triparivarta <k2>trí—parivarta <e>3 A
<L>87944 <pc>459,2 <k1>tripiṭa <k2>trí—piṭa <e>3 A
<L>88271 <pc>460,2 <k1>triviṣṭabdhaka <k2>trí—viṣṭabdhaka <e>3 A
<L>88611 <pc>462,1 <k1>truṭi <k2>truṭi <e>2 A
<L>88664 <pc>462,1 <k1>traigartaka <k2>traigartaka <e>3 A
<L>88828 <pc>463,1 <k1>tryaṅgula <k2>try—aṅgulá <e>3 A
<L>89146 <pc>464,2 <k1>tviṣ <k2>tvíṣ <e>2 A
<L>89305 <pc>465,1 <k1>daṃsiṣṭha <k2>dáṃsiṣṭha <e>2 A
<L>89386 <pc>465,2 <k1>dakṣas <k2>dákṣas <e>2 A
<L>89575 <pc>466,3 <k1>daṇḍakamaṇḍalu <k2>daṇḍá—kamaṇḍalu <e>3 A
<L>90549 <pc>470,3 <k1>darman <k2>darmán <e>3 A
<L>90801 <pc>471,2 <k1>davīyas <k2>dávīyas <e>3 A
<L>90839 <pc>471,3 <k1>daśagrīva <k2>daśa—grīva <e>3 A
<L>91100 <pc>473,1 <k1>daśā <k2>daśā <e>1 A
<L>91946 <pc>477,1 <k1>dāsabhārya <k2>dāsá—bhārya <e>3
<L>92882 <pc>481,2 <k1>dīpta <k2>dīpta <e>2 A
<L>106268 <pc>538,2 <k1>nāsikaṃdhama <k2>nāsika—ṃ-dhama <e>3 A
<L>108172 <pc>546,2 <k1>nighuṣṭa <k2>ni-ghuṣṭa <e>1
<L>132910 <pc>669,2 <k1>pratiyatna <k2>prati-yatna <e>3 A
<L>145399 <pc>732,2 <k1>bījakāṇḍaruha <k2>bī́ja—kāṇḍa-ruha <e>3
<L>166218 <pc>825,3 <k1>mūtrapurīṣa <k2>mū́tra—purīṣa <e>3 A
<L>185668 <pc>917,1 <k1>vadhū <k2>vadhū́ <e>1 A
<L>205101 <pc>1012,3 <k1>vṛṣabhaṣoḍaśā <k2>vṛṣabhá—ṣoḍaśā <e>3 B
<L>213636 <pc>1057,1 <k1>śaryaṇāvat <k2>śaryaṇā-vat <e>2 A
<L>254890 <pc>1260,2 <k1>steyakṛt <k2>stéya—kṛ́t <e>3 A
Like this, plenty of issues are seen in the data. Hope they would be resolved sooner (now that @Andhrabharati is there to point out those).
The interest is to make the data as much correct and uniform (consistent in structure) as possible, keeping the intent of the book (Author) in mind.
@Andhrabharati I see the excel file you submitted. This makes life more difficult. sigh.
AFAIK, git treats excel files (.xslx) as binary. This means the normal command-line tools (diff, 'git diff') are not applicable.
I'll try to find a way to make use of the file, by
Then, finally, maybe, I'll be able to review and make use of the changes you have made, and will be making further.
.xslx is treated as binary file -- confirmed by viewing the commit at Github:
.xslx is treated as binary file
No good, I guess?
finally, maybe, I'll be able to review and make use of the changes you have made, and will be making further.
Sounds like a big plan.
Hope they would be resolved sooner (now that @Andhrabharati is there to point out those).
Exactly, now he's part of the Cologne core team. Thanks, it's good to have you around.
The interest is to make the data as much correct and uniform (consistent in structure) as possible, keeping the intent of the book (Author) in mind.
One can't have nothing against it.
n. (Neuter) is taken as ii.
Good small catch. Have not seen such cases recently.
@funderburkjim Don't spend any of your time in converting this excel file.
It is just a split of the data into meaningful parts, for better/easier identification of issues.
And there are no modifications made in it.
I was just listing the observations looking at this file.
I've already spent an hour trying to convert to tsv with google docs.
Will stop.
When I actually do some modification, will be pushing the mw_iast file you prepared.
The pushing of excel file is just a trial to see the operation.
@funderburkjim, https://superuser.com/questions/1359699/force-excel-to-save-files-with-tsv-file-extension/1359703#1359703 doesn't help in getting tsv from xlsx?
I was thinking of inserting the HW endings [when time comes for it] before the gender info (about which I was talking aloud all these days), within flower brackets { }.
So looked if these characters were already used in the data. Found 4 such records.
§5. { } used extraneously '{' in 3 records and '}' in 4 records-
<L>19258<pc>111,2<k1>avyaṅga<k2>avyaṅga<h>2<e>1 <hom>2.</hom> <s>avyaṅga</s> ¦ <lex>mn.</lex> the girdle of the <s1 slp1="maga">Maga</s1> priests, <ls>BhavP. i</ls>; (<s>viyaṅga</s> or <s>viyāṅga</s>), <ls>VarBṛS.</ls> <info lex="m:n"/> <LEND>
;; removed the { } around ';' as per the book.
<L>40647<pc>235,2<k1>ojodā<k2>ojo—dā́<e>3 <s>ojo—dā́</s> ¦ <lex>mfn.</lex> granting power, strengthening, <ls>RV. viii, 3, 24</ls>; <ls>TS. v</ls>; <info lex="m:f:n"/> <LEND>
;; removed the { } around ';' as per the book.
<L>90352<pc>470,1<k1>dayāvat<k2>dayā́—vat<e>3 <s>dayā́—vat</s> ¦ <lex>mfn.</lex> pitiful, taking pity on (<ab>gen.</ab> <ls>MBh. xiii</ls>; <ab>loc.</ab>, ii; <ls>R. ii</ls>)<info lex="m:f:n"/> <LEND>
;; removed the { } around ';' & changed ',' to ';' between ii and R. as per the book.
<L>98507<pc>505,1<k1>dvinārāśaṃsa<k2>dvi—nārāśaṃsa<e>3 <s>dvi—nārāśaṃsa</s> ¦ <lex>mf(<s>ī</s>)n.</lex> twice furnished with the vessels called <ab n="Nārāśaṃsa" slp1="nArASaMsa">N°</ab>, <ls>AitBr.</ls><info lex="m:f#I:n"/> <LEND>
;; changed '}' to a ',' as per the book.
Now these { } are available for me to insert the HW-endings into this MW data. @------------------- The corrections phase has started now!!
I am shortly pushing the mw_iast_AB_1.txt file, with the above 4 records (as in §5) changed and also 70k+ 'double space's replaced with 'single space's.
[mw_iast_AB_0.txt file is the merging of 3 lines of each <L>
record mw_iast.txt file into a single line (\r\n replaced with \t), as was done earlier in the corrections_9.txt file from you. And I thought, this may not be of any interest to give. Of course, it was posted in alternate form (MW_IAST_0.xlsx) already.]
This also has the following corrections done afterwards-
</s>-<s>
changed as -,
which is part of the §3 above and also corrected the other cases mentioned in §3; also corrected </s><s>
(1 occurrence) and </s> <s>
(200+ occurrences) [not always a simple replacement these are- some are comma, some are semicolon; some are +; ... !!]- during this process, seen that <L>22291, <L>22291.11, <L>22292 and <L>22293
needed changes (having wrong application of the annexure data) and hence were marked as such]; only two of </s> <s>
are left (<L>81346 & <L>81979.4),
though with the marked corrections there, these also could be done with.</s> √ <s>
(little under 200 occurrences) are replaced with √
, as are many thousands of them throughout the data. [However isolated single <s>...</s>
words under the root √ are left as is, as they are not preceded with any other <s>...</s>
words(s).]There are ;; remarks where ever I felt were needed. [This is how I would be doing the rest of the work, as I go on.]
A big feast to your mind (to go through and verify) for some time!!
You may notice that I have added the actual Annexure text and Main text, as R(p,c) and O(p,c) at the ;; lines. This is to make the reference to book content 'local' (without the need to open the scan file and navigate thereupon).
With this, now I await for the feedback- if this way of working is alright for you.
Now the file is available for your perusal.
I do not wish to know the tagging process (to do it myself), as I have no use of it for my other activities. Would leave it to you (and the other members) to do that piece of work (anyway you'd be reviewing those lines) in the revised/added lines, looking at my remarks.
Another variety of issue! §6. Some examples of the extremely rare cases where the HW endings were "typed" [but very badly applied/treated!]
<L>40093.1 <pc>231,3 <k1>ema <k2>ema <e>2 E
<L>40093 <pc>231,3 <k1>ema <k2>éma <h>a <e>2
<L>40094 <pc>231,3 <k1>eman <k2>éman <e>2
The book has it thus-
These two are to be grouped- as they are comma separated and also as indicated thus at-
एम (Nom. एमम्), एमन् (Nom. एमन) are the two entries as a group here.
Guess this example is clear enough to prove my understanding, that these endings denote the nominative forms as well as variant forms (if comma separated).
@------------------------
<L>28372 <pc>162,3 <k1>āhuka <k2>āhuka <e>1 B
The book content for this is-
The pl. entry should have been āhukā [even if the last s indicating the visarga (आहुकाः) is ignored, as is the case everywhere].
The word ending if in braces is to be applied to the HW invariably as seen here, which I was mentioning all along.
These were immediately identified looking at the excel file (with different ordering/sorting ways, an inherent tool here)-
The word ending if in braces is to be applied to the HW invariably as seen here, which I was mentioning all along.
I seem to start loving your love to details and your start to understand how to document them the proper way.
Looks like I should not use # to refer to my points, as it has some other purpose here at Github.
Shall be using the character § henceforth.
I seem to start loving your love to details
We strive to understand the author well enough, before venturing to work on his work (at Andhrabharati). [A simple & fair enough rule in our working.]
your start to understand how to document them the proper way
There is nothing new for me to understand "how to document" now; from my childhood I was doing this way and being appreciated by one and all.
First look at mw_iast_AB_1.txt.
diff mw_iast_AB.txt mw_iast_AB_1.txt | wc -l
=> 1167572
Same result with diff -w
(ignoring white-space differences).
Conclusion: There is some systematic difference in formatting.
wc -l mw_iast_AB_1.txt
=> 287500 (number of lines)
wc -l mw_iast.txt
=> 880079
wc -c mw_iast.txt
=> 53839145
wc -c mw_iast_AB_1.txt
=> 53747329
AB_1 has about 100,000 less characters. I actually would have expected more.
Looking at mw_iast_AB_1.txt in emacs suggests it is AB's tab format, rather than the line format of mw_iast.txt. Perhaps the tab-format can be undone by a program. Will try.
mw_iast_AB_0.txt file is the merging of 3 lines of each <L> record mw_iast.txt file into a single line (\r\n replaced with \t), as was done earlier in the corrections_9.txt file from you. And I thought, this may not be of any interest to give. Of course, it was posted in alternate form (MW_IAST_0.xlsx) already.
This is my way of arrangement, which you had also seen earlier!
How do I recover the original format?
you just need to do reverse replacement, tab with crlf, in any editor or though any script.
I did not do any major correction, to have more reduction in character count.
All the corrections done are reported in my above message. You might go through it once, so as to understand the changes easily.
There are now 862384 lines, compared to 880079 in mw_iast.txt (17,695 fewer lines than in the original).
For example, in the entry <hom>1.</hom> <s>aṃh</s>
,
the original mw_iast.txt has several lines between the <L>...
line and the <LEND>
line:
< <hom>1.</hom> <s>aṃh</s> ¦ (<ab>cf.</ab> √ <s>aṅgh</s>) <ab>cl.</ab> 1. <ab>Ā.</ab> <s>aṃhate</s>, to go, set out, commence, <ls>L.</ls>;
< <div n="to"/>to approach, <ls>L.</ls>;
< <div n="to"/> <ab>cl.</ab> 10. <ab>P.</ab> <s>aṃhayati</s>, to send, <ls>Bhaṭṭ.</ls>;
< <div n="to"/>to speak, <ls>Bhaṭṭ.</ls>;
< <div n="to"/>to shine, <ls>L.</ls><info verb="genuineroot" cp="1Ā,10P"/>
but these line breaks are lost in mw_iast_ab_1.txt -- which joins all these together:
> <hom>1.</hom> <s>aṃh</s> ¦ (<ab>cf.</ab> √ <s>aṅgh</s>) <ab>cl.</ab> 1. <ab>Ā.</ab> <s>aṃhate</s>, to go, set out, commence, <ls>L.</ls>;<div n="to"/>to approach, <ls>L.</ls>;<div n="to"/> <ab>cl.</ab> 10. <ab>P.</ab> <s>aṃhayati</s>, to send, <ls>Bhaṭṭ.</ls>;<div n="to"/>to speak, <ls>Bhaṭṭ.</ls>;<div n="to"/>to shine, <ls>L.</ls><info verb="genuineroot" cp="1Ā,10P"/>
Can you do something about this? -- e.g., when you construct your tabbed file from mw_iast.txt, you could add text
<LB>
where there are line breaks.
Also, leave the 15 blank lines in your tabbed file (such as the blank lines in mw_iast.txt preceding <L>16764.2<
.)
The reason for these requests is so that I can identify what 'real' changes you are making.
I have talked about this line count difference in my §1. above clearly.
I do not understand why you need those 15 blank lines back. I would instead suggest you to remove those lines in the Koeln file(s) themselves!
For the other 17k+ lines to be reverted, two replacements are to be done- (i) $ to be replaced with crlf. (2500+ lines) (ii) "<div" lines to be got by inserting "crlf" before the "<div". (15k+ lines)
Simple process.
Looks like you are NOT reading my messages, but just trying to directly look at the files (being) sent.
I'll try using your suggestions (i) and (ii) tomorrow.
Yes, I am trying to process the mw_iast_AB_1.txt file.
Recall that you said
When I actually do some modification, will be pushing the mw_iast file you prepared.
So I was not expecting to have to do so many steps to get a file comparable to the mw_iast.txt file.
I thought the messages (detailing the changes) and the file (with incorporated changes) would go together!
And I recall that when I initially posted the change details alone, you had tried to look into the (Excel) file to "see" them.
Also there are not that "many steps" involved to get a comparable file- just three simple find/replace operations, not taking more than a couple of seconds' time (if only my messages were really "read" once)!!
(if only my messages were really "read" once)
It's not about reading. You have given so many files - it's rather hard to understand what is what for and what has been done in each.
It's not about reading. You have given so many files - it's rather hard to understand what is what for and what has been done in each.
I have given just two files, one Excel file before- without any "modifications" done, which I said Jim might not be able to use; and one mw_iast_AB_1.txt file now- with "modifications" done, which is meant for Jim's perusal.
And I clearly mentioned about both the files- of what they contain.
If this remark is regarding my postings/messages, I will STOP those henceforth. (No point spending my time on such use-less/mis-leading/confusing things.)
If this remark is regarding my postings/messages, I will STOP those henceforth.
Please do not run so fast. Give us time to understand. What you write is of highest interest, but there are too many aspects involved at once.
@Andhrabharati
Have been able to derive a file comparable to mw_iast.txt from mw_iast_AB_1.txt file.
And have begun comparisons so as to understand the changes you have made. The bulk of your changes are quite reasonable.
The line counts in comments below pertain to file in format of mw_iast.txt.
-<s> => <s>-
and </s>-
=> -</s>
5692 lines
</s>-
in mw_iast_AB_1.txt. Would you correct?</s></s>
removed 227 lines°<s> => <s>° and </s>° => °</s>
208 lines
<s>
that
you made. I'll get to those eventually.At this point I had reduced the number of unknown differences considerably, from about 100k to 1,300. Now I begin to examine some of the remaining differences.
The first thing that happened to catch my eye involves the metaline
this is the line the begins each entry. Its format is
<L>x<pc>y<k1>z<k2>w<h>u<e>v
With<h>
parameter being optional for MW,
There are several errors in mw_iast_1_AB.txt that have been introduced. By my count, there are 593 erroneous metalines. A list of them is in file metaline_prob_ab_1.txt
The problems fall into 5 categories:
<k2[^>]
i.e. the <k2>
tag is missing >
145<k>
the <k2>
tag is missing the '2' 79<2>
the <k2>
tag is missing the k
41<>
the <e>
tag is missing the e
75<k2>
tag is missing the first letter. 253
<L>201921<pc>996,3<k1>viṣama<k2>i-ṣama<h>b<e>1
Would you correct these?
I hope you will make the corrections requested above to mw_iast_AB_1.txt and then push the new file up to Github. Then I will continue the initial analysis.
problems in metaline
I had already noticed & corrected these and updated the file.
Probably you happened to use the initial file.
Anyway, will update the file again with the other ones pointed by you.
This time, I think I should start giving the file directly in your "comparable form" itself, a simple task for me, saving much of our times (yours in interpreting & mine in explaining).
will update the file again
Thanks!
file directly in your "comparable form"
No need. If you maintain current formatting details of your 'tab' file. I now have a program that can convert your 'tab' file to the 'comparable form'. But thanks for the offer.
No need. If you maintain current formatting details of your 'tab' file.
But what about the 17k+ lines with non-<L>
beginnings?
Probably I should take that your (pre-processor) program handles those as well- looking at your statement "No need".
So, I will stick to my format only.
Added another piece of change-
§7. &c. to be with a space before.
&c. (and also &) is always to precede with a space (except at the start of a braced content) : ~50 occurrences
&c. inside <ls>...</ls>
is taken out to be after </ls>
: ~3900 occurrences
Now the file is mw_iast_AB_2.txt (with this §7 correction & the corrections as mentioned by @funderburkjim implemented). [Sorry that I am progressing further, before the 1st file content is accepted.] [Reminder: §4 and §6 are yet to be taken up]
This issue continues #83.
The changes begin!
@Andhrabharati
Suggest you
git add
,git commit -m "...."
, git push as a trial run.Once we're sure the git process works,
suggest you commit and push often, so we can comfortably follow your changes.