Closed shreevatsa closed 1 year ago
Maybe we can refactor the telang_html_page_from_regions.py
file to do this, and then repeat it with the earlier version of the .py
file.
I think I need to think through the data format with some examples. That's what is slowing me down here.
The "input" file telang-regions.json
is an array of dict
s, that basically amounts to this:
slug | type | name | xmin | ymin | width | height | text |
---|---|---|---|---|---|---|---|
51 | lgVerse | 001 | 584 | 1381.5 | 1714 | 360 | ... |
52 | lgFootnote | 005 | 356 | 3327.5 | 2423 | 280 | ... |
Or, with:
cat telang-regions.json | sqlite-utils insert delitos.db delitos - --pk id
after which:
sqlite> select * from delitos limit 5;
┌────┬──────┬──────────┬──────┬──────┬────────┬───────┬────────┬──────────────────────────────────────────────────────────────┐
│ id │ slug │ type │ name │ xmin │ ymin │ width │ height │ text │
├────┼──────┼──────────┼──────┼──────┼────────┼───────┼────────┼──────────────────────────────────────────────────────────────┤
│ 1 │ 51 │ lgHeader │ │ 984 │ 1115.5 │ 1263 │ 266.0 │ ["॥ अथ नीतिशतकम् ॥"] │
├────┼──────┼──────────┼──────┼──────┼────────┼───────┼────────┼──────────────────────────────────────────────────────────────┤
│ 2 │ 51 │ lgVerse │ 001 │ 584 │ 1381.5 │ 1714 │ 360.0 │ ["दिक्कालाद्यनवच्छिन्नानन्तचिन्मात्रमूर्तये ।", "स्वानुभूत्य │
│ │ │ │ │ │ │ │ │ ेकसाराय नमः शान्ताय तेजसे ॥ १ ॥"] │
├────┼──────┼──────────┼──────┼──────┼────────┼───────┼────────┼──────────────────────────────────────────────────────────────┤
│ 3 │ 51 │ lgVerse │ 002 │ 568 │ 1741.5 │ 1675 │ 588.5 │ ["यां चिन्तयामि सततं मयि सा विरक्ता", "साप्यन्यमिच्छति जनं स │
│ │ │ │ │ │ │ │ │ जनोग्यसक्तः ।", "अस्मत्कृते च परितुष्यति काचिदन्या", "धिक्त │
│ │ │ │ │ │ │ │ │ ां च तं च मदनं च इमां च मांश्च ॥ २ ॥"] │
├────┼──────┼──────────┼──────┼──────┼────────┼───────┼────────┼──────────────────────────────────────────────────────────────┤
│ 4 │ 51 │ lgVerse │ 003 │ 559 │ 2330.0 │ 1783 │ 327.0 │ ["अज्ञः सुखमाराध्यः सुखतरमाराध्यते विशेषज्ञः ।", "ज्ञानलवदुर │
│ │ │ │ │ │ │ │ │ ्विदग्धं ब्रह्मापि नरं न रज्जयति ॥ ३ ॥"] │
├────┼──────┼──────────┼──────┼──────┼────────┼───────┼────────┼──────────────────────────────────────────────────────────────┤
│ 5 │ 51 │ lgVerse │ 004 │ 551 │ 2657.0 │ 1680 │ 671.0 │ ["प्रसद्य मणिमुदरन्मकरवचदंष्ट्राकुरा", "समुद्रमापे संतरेत्य │
│ │ │ │ │ │ │ │ │ चलर्मिमालाकुलम् ।", "भुजङ्गमपि कोपितं शिरासे पुष्पवद्धारये . │
│ │ │ │ │ │ │ │ │ ", "न तु प्रतिनिविष्टमूर्खजनचित्तमाराधयेत् ॥ ४ ॥"] │
└────┴──────┴──────────┴──────┴──────┴────────┴───────┴────────┴──────────────────────────────────────────────────────────────┘
and
sqlite> select * from delitos order by random() limit 5;
┌─────┬──────┬────────────┬──────┬──────┬────────┬───────┬────────┬──────────────────────────────────────────────────────────────┐
│ id │ slug │ type │ name │ xmin │ ymin │ width │ height │ text │
├─────┼──────┼────────────┼──────┼──────┼────────┼───────┼────────┼──────────────────────────────────────────────────────────────┤
│ 513 │ 112 │ lgVerse │ 093 │ 563 │ 771.5 │ 2172 │ 376.0 │ ["आत्माराम फलाशी गुरुवचनरवस्त्वत्प्रसादात्स्मरारे", "दुःखाम् │
│ │ │ │ │ │ │ │ │ मोक्ष्ये कदाहं तव चरणरतो ध्यानमार्गेकप्रश्नः ॥ ९ ३ ॥"] │
├─────┼──────┼────────────┼──────┼──────┼────────┼───────┼────────┼──────────────────────────────────────────────────────────────┤
│ 266 │ 81 │ lgVerse │ 116 │ 548 │ 2923.5 │ 1971 │ 302.5 │ ["कमठकुला चल दिग्गजफणिपतिविधृतापि चलति वसुधेयम् ।", "प्रतिपन │
│ │ │ │ │ │ │ │ │ ्नममलमनसां न फलति पुंसां युगान्तपि ॥ ६ ॥"] │
├─────┼──────┼────────────┼──────┼──────┼────────┼───────┼────────┼──────────────────────────────────────────────────────────────┤
│ 375 │ 94 │ lgHeader │ │ 1160 │ 1140.0 │ 886 │ 190.0 │ ["निर्ममतास्वरूपमाह |"] │
├─────┼──────┼────────────┼──────┼──────┼────────┼───────┼────────┼──────────────────────────────────────────────────────────────┤
│ 619 │ 124 │ lgFootnote │ 146 │ 344 │ 3939.5 │ 2379 │ 596.0 │ ["XXXIII . ( 4 ) भजचपळा : ; भोगचपळा :. J. तुल्यतरला : K. भङ् │
│ │ │ │ │ │ │ │ │ गतरला : N.", "( b ) सुखम् ; मुखम , I ' . सुख N. प्रीतिः प्रि │
│ │ │ │ │ │ │ │ │ यवस्थिरा ; फूर्तिः क्रियासु स्थिता G. K", "T. Bo . ( where त │
│ │ │ │ │ │ │ │ │ िँ for ति :) N. ( where मि for क्रि ) ( c ) संसा ; \" जन्सा │
│ │ │ │ │ │ │ │ │ C", "A. Bo.n. नि ॰; ° म ° . A. Bon . बुद्धा ; मला B. P. बुधा │
│ │ │ │ │ │ │ │ │ बीका बुन्नोभये ,", "• M. बुधान्योधने A. Bon . बुधा यौवने P. │
│ │ │ │ │ │ │ │ │ R."] │
├─────┼──────┼────────────┼──────┼──────┼────────┼───────┼────────┼──────────────────────────────────────────────────────────────┤
│ 596 │ 121 │ lgFootnote │ 135 │ 352 │ 4447.5 │ 2327 │ 252.5 │ ["XXII . ( 4 ) मम महारामरचिते ; पथि मठारामसरितः B. M. ( b ) │
│ │ │ │ │ │ │ │ │ ° द्विपमृग", "सु ' ; ' गवि , B. विटविमृग ● M."] │
└─────┴──────┴────────────┴──────┴──────┴────────┴───────┴────────┴──────────────────────────────────────────────────────────────┘
For simplicity, let's say the input is a list like:
The "output" we want is a list, of something like:
So for example:
n | name | map |
---|---|---|
1 | header -> [{atha nītiśatakam}] | |
2 | N001 | verse -> [{dikkālā…}] footnote -> [{I. ... }] note -> [{Stanza 1. ...}, {along with times ...}] |
3 | N002 | verse -> [{...}] footnote -> [{II. ...}] note -> [{St. II ...}, {to king...}] |
The input is a list of (name, type, Region)
First group by name, to get for each name a list of (type, Region)
.
Then within each name, group by type, to get a map from type to list of Regions.
This is taking me a long time for some reason, so let's break it down further.
Creating the separate issue #14 helped unblock me; now the alignment data is stored in https://github.com/shreevatsa/bhartrhari/blob/d689efc022c994d8ce678388f32fd24f6b5ec6e2/data/regions/telang-regions-out.json with names matching those in https://github.com/shreevatsa/bhartrhari/blob/d689efc022c994d8ce678388f32fd24f6b5ec6e2/data/alignment/Telang-Tawney.csv.
Need to do the same for Kosambi now.
Deleted some earlier files with 034689df90d96f035d60f3e6a4714cc4a5bf4976
I need to figure out how in my other repo I was generating the Kosambi HTML file back then:
python get_proof_lg_regions.py > out.html
but may need the earlier version of the file.
Got the file!
git show f1c8c4a14f23c95766423963b9724f0e275ede79:get_proof_lg_regions.py > get_proof_lg_regions_kosambi.py
and then:
python3 get_proof_lg_regions_kosambi.py > out.html && diff -s out.html ~/w/shreevatsa.net/website/static/tmp/2023-03/kosambari.html
get_proof_lg_regions_kosambi.py
is like the concatentation of:
ambuda/get_proof_lg_regions.py
telang-regions-dump.py
telang_html_page_from_regions.py
Except that it has names like 001f
, which won't do.
I think I have it in this file now: https://github.com/shreevatsa/bhartrhari/blob/11cc32ab5653746faedeedd8aae4c7153e17261d/data/regions/kosambi-regions-out.json added in commit 11cc32ab5653746faedeedd8aae4c7153e17261d.
For Kosambi and Telang.