Python deva_iast_comp step1

funderburkjim commented 2 years ago

This continues the programmatic analysis of differences between two 'things':

the Devanagari spelling and
IAST spelling of headwords in the MD dictionary.

See the discussion at https://github.com/sanskrit-lexicon/csl-orig/issues/628.

Review the comment regarding analysis of the case hasa:hás-a ({%or%} á).

Before applying Python tools, we need to informally answer two questions:

in the example, what are the two things?
Our data is a Python string object representing the text hasa:hás-a ({%or%} á). What detail of this text can we use to identify those two things?

@AnnaRybakovaT What is your answer to these two questions?

AnnaRybakovaT commented 2 years ago

Dear Jim, Thanks for the Step 1! I will focus on this and will try to find correct answers tomorrow.

AnnaRybakovaT commented 2 years ago

in the example, what are the two things?

1) The 1st thing is "hasa" - the slp1 spelling of the Devanagari spelling in the MD dictionary. The 2nd thing is "hás-a ({%or%} á)" - IAST spelling in the MD dictionary.

2) To identify those two things in the text we can use ":".

funderburkjim commented 2 years ago

Perfect! 👍

Now, what is the Python way to use ':' to split the given text into those two pieces?

Answer: there are 2 straightforward ways to do this string manipulation in Python

the 'split' method on python strings: text.split(':')
the regular expression split method: re.split(':',text)

There are many good tutorial resources online to explain basic concepts in Python. One resource that may be helpful here is https://www.w3schools.com/python/.
This website explains simple concepts and provides a 'playground' where you can experiment.

Some top-level topics relevant to splitting strings are:

Python String (there is a sub-sub-topic just about 'split')
Python RegEx (there are lots of things to do with regex, don't try to understand everything at once - just re.split)
Python lists -- since splitting a string results in an answer which is a list of strings, you should gain a little familiarity with lists now. There is much to say about lists, but for now just understand elementary things like
- x = [1,'apple','pie'] # define a list
- What is x[0], what about x[1], what about x[2], what about x[3]?
- What is len(x) ?
Python tuples -- similar to lists

@AnnaRybakovaT So experiment some with these topics. When you feel comfortable with using Python to split our text into a list with two parts, we'll then think how to fit this into a step1/readwriteA2.py program.

AnnaRybakovaT commented 2 years ago

One resource that may be helpful here is https://www.w3schools.com/python/. Dear Jim, Thank you very much! This resource is amazing!!!

Regarding x = [1,'apple','pie'] There is the list which consists 3 items: x[0] =1, x[1] ='apple', x[2] ='pie', as I understand x[3] doesn't exist in this list

len(x) - is a function which returns the number of items in an object. In our list "x" the result of this function is 3.

funderburkjim commented 2 years ago

Looks like you've got the idea of list. When you've got the idea of split(), Go ahead and:

make a new directory 'step1' under deva_iast_comp (so step1 will be on the same level as 'step0', i.e., step1 will be a sibling of step0). Do this via a command in the Git Bash terminal. Use the Unix command 'mkdir' to make the step1 directory.
copy step0/readwriteA1.py to step1/readwriteA2.py . Use the Git Bash terminal and the Unix 'cp' command to do this.
Now modify readwriteA2.py so it outputs the split lines. There will be some choices regarding what should go into 'newlines'.
Run your program -- is the output what you anticipated?
When you're ready, be sure to update the documenation (in two readme files, one old and one new, and in the program .py file).
Push results. I'll review and make further comments.

AnnaRybakovaT commented 2 years ago

Dear Jim, I need your help, please!

1) I made a new directory 'step1' (as I thought) After this command a folder Step 1 was created:

2) But when I wanted to copy the file readwriteA1.py I had message that "step 1 is not directory"

Where is my mistake?

funderburkjim commented 2 years ago

the 'cp' (copy) command normally takes 2 arguments, not 3 arguments: cp path-to-old-file path-to-copy-file.

But, as with many unix commands, there are various ways to use the command, so your particular usage was interpreted as cp first-old-file second-old-file target-directory.

So, reformulate your command to use the normal 2 arguments.

That's the first point. Now, the second point is related to details regarding how to specify the 'path-to-old' and 'path-to-new'. These paths are generally relative to your 'current directory'.

Your location of your current directory is shown in the git bash prompt as ~/Documents/sanskrit-lexicon/MD/deva_iast_comp/step0, or, informally, as step0.

[This can also be found by the Unix command pwd; "pwd = print working directory"].

In the next comments, I'll assume that your current directory is step0. Where is 'old-file' relative to current directory? Well, that's easy -- readwriteA1.py is in step0. You can check this by the ls command ("ls" = "list directory contents."). Try it!

so the first part of the desired cp command is as you wrote it cp readwriteA1.py path-to-new.

Now, suppose you used the command cp readwriteA1.py readwriteA2.py -- what would this do. Give it a try, and then do an ls.
You don't really want readwriteA2.py to be in the step0 directory. Use the 'rm' command to delete the unwanted file ("rm" = "remove").

Recall the discussion in #3 about '../'. Try the 'cp' command again with 'path-to-new' as <something>readwriteA2.py. (

Do an ls. Do you find readwriteA2.py ? If not, where is it? Do a 'cd' command to get your current directory to be 'step1'. Then do an 'ls' -- Have you found readwriteA2.py now?

further experiments.

Try various 'cp' and 'ls' commands and 'cd' commands. Remember you can use 'rm' commands to get rid of unwanted copies.

resource for unix commands

You can Google searches such as 'unix cp command' to get further information (sometimes more than you want!). One source that turned up for me was https://www.tutorialspoint.com/unix_commands/cp.html and similar for others. But you really won't need but a few Unix commands, most of which are mentioned above.

AnnaRybakovaT commented 2 years ago

Dear Jim, Thank you so much!

cp readwriteA1.py readwriteA2.py -- what would this do

This command will create the file readwriteA2.py in the current directory. This I understood from the beginning and for this reason I was thinking how to create a new file in another directory. Thanks to your explanations the solusion has found now. There is a command: cp readwriteA1.py ../step1/readwriteA2.py

AnnaRybakovaT commented 2 years ago

Now modify readwriteA2.py so it outputs the split lines. There will be some choices regarding what should go into 'newlines'

Dear Jim, I was trying to modify this file by different ways, unfortunately still without result. I have to learn more about Python and I hope in a while I will find a solution. Otherwise I will explain you my ideas and you will guide me to find a correct path.

funderburkjim commented 2 years ago

Your revised cp command is just right!.

I'll wait until you request another hint for readwriteA2.py.

AnnaRybakovaT commented 2 years ago

Dear Jim, I have jast finished analyzing the words_mw_noneng.txt and now I can more focus on Python commands. I hope tomorrow or the day after tomorrow I will be able to continue the Step 1.

AnnaRybakovaT commented 2 years ago

Now modify readwriteA2.py so it outputs the split lines. There will be some choices regarding what should go into 'newlines'.

Dear Jim, Finally - I need your help. As I guess I should start from updating the function adjustlines(lines). We can split every line by: newline = line.split(":") or newline = re.split(":", line) But here we receive not line but a list of 2 new lines. In this case probably should be: newlines = line.split(":") or newlines = re.split(":", line)

In this case, the updated part of the program looks like this: def adjustlines(lines): newlines = [] for line in lines: newlines = line.split(":") newlines.append(newlines) return newlines

Probably I miss something important regarding list and stripe/line (see the Error massage).

If I run this program – the result is a file readwrite A2.text but only with the 1st splitted line.

AnnaRybakovaT commented 2 years ago

Dear Jim, Maybe I have found solution. Finally the function looks like this :

def adjustlines(lines): newlines = [] for line in lines: x1 = line.split(":") newline1 = x1[0] x2 = line.split(":") newline2 = x1[1] newlines.append(newline1) newlines.append(newline2) return newlines

And the result is this list:

funderburkjim commented 2 years ago

Your revised form is certainly one possibility, out of many possibilities.
(minor Note: you actually don't need 'x2').

In fact, we don't at this stage know just what will be the best output, because we are only at the beginning stage of analysis.

I'll dream up a couple of other possibilities, just to show you additional useful techniques.

In the meantime,

go ahead and push your solution
and then get started with 're.sub'.

Here's what looks like a useful next step in our analysis: We are wanting to compare the spellings of 'newline1' (the slp1 spelling of headword) with the IAST spelling that (sometimes) appears at the beginning of newline2. But to do that, we must get rid of junk at the end of newline2.

Question 1: What is junk here?

Look at the data and describe in words what we need to get rid of in newline2 in order just to be left with the IAST. Go ahead and post your answer in a comment.

Question 2: How do we get rid of the junk?

Python often touts itself as the programming language with batteries included, which means it includes many modules with specialized capabilities to help the programmer solve common problems. One of the modules used in text processing is the 'regular expression' module.

A program that uses the regex module must import it, by import re. You'll see that in our readwrite program, the module has been imported (import sys,re,codecs which imports three modules).

In fact, you've already seen one usage of regex module in 're.split'.

Now, I'm pretty sure that we can use re.sub (regular expression substitution) to get rid of the junk. Essentially we will use something like newline3 = re.sub(JUNK,'',newline2) to replace JUNK in newline2 with an empty string. Here JUNK is a regular expression pattern that describes the portion of the text of newline2 that we want to remove.

So once we have an answer to question 1, our task reduces to translating the answer into a regular expression pattern (or perhaps our problem will require more than one pattern).

You can get started learning about regular expressions with online tutorials such as https://www.w3schools.com/python/python_regex.asp.

funderburkjim commented 2 years ago

showing Python code in a Github comment

In your example above you showed your revised adjustlines function, but note that the indentation is lost. If you 'edit' the comment, the Python indentation is present. If you precede and follow a chunk of text with triple back-quote, then the indentation is retained. Next I have copy-pasted your function code and put it in triple back-quotes:

def adjustlines(lines):
 newlines = [] 
 for line in lines:
  x1 = line.split(":")
  newline1 = x1[0]
  x2 = line.split(":")
  newline2 = x1[1]
  newlines.append(newline1)
  newlines.append(newline2)
 return newlines

AnnaRybakovaT commented 2 years ago

(in two readme files, one old and one new

Dear Jim, As I see - I should create one new file readme.txt in /step 1 and update the old file in /deva_iast_comp (not in /step0). Is it correct?

AnnaRybakovaT commented 2 years ago

(minor Note: you actually don't need 'x2')

Just for curiosity I ran the program readwriteA3_test.py where "x2" was deleted. The output (readwriteA3_test.txt) consist only the 1st parts of lines (befor ":"). Could you check this program and identify my mistake, please.

AnnaRybakovaT commented 2 years ago

Question 1: What is junk here?

There are some examples from lines 2 ághnya ({%also%} {@-yá@}) á-dṛp-ita {%or%} {@-ta áhas = áhar

First of all we should delete data after the main word. For this possible to use a function re.split with some patterns "(", "{" and " ". Just now I know only this function but as I see my next task is re.sub

In next step we should delete "-" inside of our words.

And finally we don't need accent marks.

AnnaRybakovaT commented 2 years ago

Question 1: What is junk here?

As well there are some words with junk ("~" and "") in front: ~naṣ-ṭa nāgarī

funderburkjim commented 2 years ago

in two readme files, one old and one ...

Yes, that was the idea.

funderburkjim commented 2 years ago

readwriteA3_test.txt

readwriteA2.py has:

  x1 = line.split(":")
  newline1 = x1[0]
  x2 = line.split(":")  # only this line is extraneous
  newline2 = x1[1]
  # We want to add the new line to our list of new lines.
  # 'append' is the way to do that
  newlines.append(newline1)
  newlines.append(newline2)

while readwriteA3_test.py has

  x1 = line.split(":")
  newline1 = x1[0]
  # We want to add the new line to our list of new lines.
  # 'append' is the way to do that
  newlines.append(newline1)

funderburkjim commented 2 years ago

what is `builtin` ?

Noticed you changed 'builtin' to 'builting' or maybe it was 'building' in a comment (e.g., line 36 of readwriteA2.py). There is a building way to split strings into a list (separator is ":") should be There is a builtin way to split strings into a list (separator is ":")

'builtin' (or 'built-in') appears to be a somewhat technical word, which should in this context be used in place of 'building' or 'builting' (I don't think 'builting' is in the English lexicon).

In this case, the sense is that we don't have to write a function to split a string into a list of substrings. Instead, the python distribution already has solved this problem -- that is, there is a solution already 'built into Python'. Or equivalently, there is a Python 'builtin' solution. All we have to do is learn how to use one of the string splitting tools built into python.

funderburkjim commented 2 years ago

First of all we should delete data after the main word.

That looks promising. It looks to me that the 'main word' (in newline2) never contains a space character. If this is true for our ../data.txt, then we can say that 'newline3' (which is to contain only the main word from newline2) is defined by removing the first space character plus all subsequent characters in newline2. This looks almost right, but there are some cases where there is NO space character, such as the 'abnormal' lines.

Informally, we can say

if newline2 contains no space character, then newline3 = newline2
else, newline3 consists of newline2 with all the characters from the space to end removed.

Actually, from this informal statement, I think we can get newline3 by using another split on newline2. @AnnaRybakovaT do you see how to use split to get newline3? Give it a try. Then we'll try to develop a similar solution using regular expressions. Write a readwriteA3.py (or whatever you want to call the program).

funderburkjim commented 2 years ago

interactive python

Sometimes, when you want to experiment with a small bit of code, it is helpful to use python interactively. To do this, just type 'python' (return) in the terminal. Here's what it looks like:

$ python
Python 3.9.1 (tags/v3.9.1:1e5d33e, Dec  7 2020, 17:08:21) [MSC v.1927 64 bit (AM
D64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

There is a blinking vertical line after '>>> ' indicating the program is waiting for you to type something.

The first thing to do is learn how to exit the interactive python session.
One way to do this by typing 'quit()'.

$ python
Python 3.9.1 (tags/v3.9.1:1e5d33e, Dec  7 2020, 17:08:21) [MSC v.1927 64 bit (AM
D64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> quit()

jimfu@DESKTOP-6PTUC6R MINGW64 /c/xampp/htdocs/sanskrit-lexicon/md/deva_iast_comp/step1 (master)
$

Now, you're back to the git bash terminal.

try a split

You can also do 'python -i' instead of 'python'.

Python 3.9.1 (tags/v3.9.1:1e5d33e, Dec  7 2020, 17:08:21) [MSC v.1927 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> x = 'roses are red and violets are blue'
>>> x.split(' ')
['roses', 'are', 'red', 'and', 'violets', 'are', 'blue']
>>> x = 'antidisestablishmentarianism'  # yes, that is a word!
>>> x.split(' ')
['antidisestablishmentarianism']
>>> quit()

interactively get newline3

Try to use some of the lines in ../data.txt as x-es., and try to use split to get newline3. When you've got it perfected in interactive session, then you'll be ready to write readwriteAX.py

It's optional whether to use the interactive python.
Sometimes, instead, I'll write a short temp.py program as another way to test out an idea.
Or I'll write a test() function in the program under development. Some people might also use the google colab for similar testing. You'll develop your own preferences as time goes by.

AnnaRybakovaT commented 2 years ago

only this line is extraneous

Dear Jim, Thanks a lot! Now I understood.

AnnaRybakovaT commented 2 years ago

builtin' (or 'built-in'

Thanks! I will correct now the files.

AnnaRybakovaT commented 2 years ago

This looks almost right, but there are some cases where there is NO space character, such as the 'abnormal' lines.

Dear Jim, I was thinking how to split lines2, but I have no other ideas exept using the space character. I ran this test program:

def adjustlines(lines):
 newlines = [] 
 for line in lines:
  x1 = line.split(":")
  newline1 = x1[0]
  x2 = line.split(":")
  newline2 = x1[1]
  x3 = newline2.split()
  newline3 = x3[0]
  newlines.append(newline1)
  newlines.append(newline2)
  newlines.append(newline3)
 return newlines

Output (readwriteA3_test.txt) is not so bad, since new lines3 include:

NO space character lines (abnormal → abnormal)
the lines without junk (ághnya ({%also%} {@-yá@}) → ághnya)

funderburkjim commented 2 years ago

comment on triple backquote

In your comment, you should

Put the starting triple backquote on a separate line
Be sure to put an ending triple backquote on a separate line after the intended text.

Why don't you edit the above comment with these changes. -- do you see the difference?

AnnaRybakovaT commented 2 years ago

do you see the difference?

Many thanks! I realized my mistake - the keywords are on separate line!!!

funderburkjim commented 2 years ago

readwruteA3_test

Your solution re splitting on space looks fine. you used newline2.split() where you use the default argument for split. I think this usage means split on any character which is considered to be 'white space'. (Ref: https://www.w3schools.com/python/ref_string_split.asp). I would probably have used newline2.split(" ") which would have split only on space character. But in the context of this program, the two probably give the same result.

suggestion for output

We now have 3 lines of output for each line of input. And the .txt file is getting a bit hard to read. Suggestions:

add a separate spacing line (such as a line of '-' )BEFORE adding newline1
then add the original line with a label: newlines.append('original = %s' %line)
And when you append newline1 to newline3, given each of those lines a label.

Do you think these changes make the output a bit easier to read?

what is that `%s` ?

There are many ways to construct strings. One very flexible way involves '%s'. This is currently considered 'old-fashioned' in Python, but I still use it a lot. Try this reference for an introduction https://www.learnpython.org/en/String_Formatting).

funderburkjim commented 2 years ago

suggestion for output example:

Here is example of how the 3rd line might appear based on the above 'suggestion for output'

-------------------------------------
orig = aGnya:ághnya ({%also%} {@-yá@})
slp1 = aGnya
rest = ághnya ({%also%} {@-yá@})
iast = ághnya

AnnaRybakovaT commented 2 years ago

And the .txt file is getting a bit hard to read.

I absolutely agree. I will try to update the output on Monday (probably tomorrow I will not be in front of computer).

AnnaRybakovaT commented 2 years ago

add a separate spacing line (such as a line of '-' )BEFORE adding newline1

Dear Jim,

You can check the updated output (file readwriteA3.txt).

For adding a separate spacing line i used this command: newlines.append('%s' %"-----------------------")

Regarding this I have a question. I am curious if exists more simple way to put "-" during the all length of a string or just put a number - how many times we wanna to appear "-"?

funderburkjim commented 2 years ago

readwriteA3.txt looks fine.

newlines.append('%s' %"-----------------------")

This is ok but awkward, In Fact if x is any string, and y is the string "%s" % x then x and y are equal string. So newlines.append("-----------------------") gives the same result.

$ python -i
Python 3.9.1 (tags/v3.9.1:1e5d33e, Dec  7 2020, 17:08:21) [MSC v.1927 64 bit (AM
D64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> x = "abcd"
>>> y = "%s" % x
>>> x == y
True

A string with 15 '-'

Yes, there is a Python way to do this. x*n where x is a string, and n is a positive integer; the result is a string comprised of n copies of x.

$ python -i
Python 3.9.1 (tags/v3.9.1:1e5d33e, Dec  7 2020, 17:08:21) [MSC v.1927 64 bit (AM
D64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> '-'*10
'----------'
>>> 'ab'*5
'ababababab'
>>> quit()

funderburkjim commented 2 years ago

remove hyphens in iast

e.g., set new variable newline3a to be newline with the '-' characters replaced by empty string ''.

You can use the 'replace' method for strings. Research this by searching python string replace.

We don't really need to have both newline3 and newline3a, so just have your output use newline3a.

funderburkjim commented 2 years ago

count number of abnormals

Ultimately, each of the 'abnormal' items will need to be examined individually in md.txt to see if the slp1 and iast are consistent. So it is of interest to know how many of these there are.

One programming way to count the number of abnormal items is to

use a variable, e.g., nabnormal. We need to initialze this to zero (0) BEFORE the loop in adjustlines function.
Then, in the loop, at some place we need to use an if clause to test if the line is abnormal, and if it is, then we need to increment our counter nabnormal = nabnormal + 1
- Learn how to use 'if' statements in Python, e.g., https://www.w3schools.com/python/python_conditions.asp
- You may also want to briefly read about Python booleans, since they are part of 'if' statements
finally, after the loop ends, but before the adjustlines function returns, you need to print to the terminal the value of nabnormal, e.g. There are 25 lines marked abnormal.

funderburkjim commented 2 years ago

transliteration

We ultimately want to compare the iast from our file with the slp1. One way to do this is to convert the slp1 to iast (save the result in some variable, such as slpiast). Such a conversion (from the slp1 transcoding of Sanskrit to the iast transcoding of Sanskrit) might be called a transliteration.

Sanskrit transliteration is a specialized functionality that is NOT built into Python. Thus we either need to write the necessary functionality ourselves or use an implementation by someone else.

Luckily, there are already ways to convert slp1 to iast.

Let's use the transliteration library that @drdhaval2785 prefers. We will use the 'pip' tool to install a package (See https://www.w3schools.com/python/python_pip.asp for brief general intro to using pip).

pip install indic_transliteration
## This will print a bunch of information to terminal, which you generally can ignore.

When the installation is done, we can test it out:

$ python -i
Python 3.9.1 (tags/v3.9.1:1e5d33e, Dec  7 2020, 17:08:21) [MSC v.1927 64 bit (AM
D64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from indic_transliteration import sanscript
>>> sanscript.transliterate('fziH','slp1','iast')
'ṛṣiḥ'
>>> sanscript.transliterate('rAma','slp1','iast')
'rāma'
>>> quit()

For the next task, generate one more line in the next version of readwrite to compute and show slpiast.

funderburkjim commented 2 years ago

indic_transliteration doc

A search will take you to pypi, then click on the homepage. which leads to https://github.com/indic-transliteration/indic_transliteration_py.

gasyoun commented 2 years ago

@koleslena and @OlgaSoloveva and @DomiCheck and @VladimirWl - does it makes sense to you?

DomiCheck commented 2 years ago

@koleslena and @OlgaSoloveva and @DomiCheck and @VladimirWl - does it makes sense to you? Yes, it makes.

koleslena commented 2 years ago

does it makes sense to you?

I know all of these except indic_transliteration, installed it and tried to use, had some problems

drdhaval2785 commented 2 years ago

I can help with indic_transliteration. You may post the question and expected outcome. Will be able to guide where it goes wrong

AnnaRybakovaT commented 2 years ago

Yes, there is a Python way to do this.

Dear Jim, Thanks a lot! Now I can update this command.

AnnaRybakovaT commented 2 years ago

newline3a to be newline with the '-' characters replaced by empty string ''.

I updated iast strings. As well 3 more characters ('~', "*", "[a]") replaced by empty string. As you see, I used the function replace step by step some times:

  newline3a = newline3.replace("-", "")
  newline3b = newline3a.replace("~", "")
  newline3c = newline3b.replace("*", "")
  newline3d = newline3c.replace("[a]", "")

Could I did it more easy?

AnnaRybakovaT commented 2 years ago

One programming way to count the number of abnormal items is to

Dear Jim, Could you explain, please, where I should place this new adjustlines function? Should it be in our program or it is one separated program?

AnnaRybakovaT commented 2 years ago

When the installation is done, we can test it out:

Dear Jim, The installation is done but something is wrong.

Probably:

I have to make restart
or the problem is in this WARNING which appeared after installation: WARNING: You are using pip version 21.1.1; however, version 22.0.2 is available. You should consider upgrading via the 'c:\users\rybakova\appdata\local\programs\python\python38\python.exe -m pip install --upgrade pip' command.

funderburkjim commented 2 years ago

A re.sub example

In the question above, the aim was to replace several characters with the empty string. Using a sequence of string replacements is one valid way, as shown above.

There is another way using regular expressions. Here is a silly example, whick replaces in the string 'x' , any character to 'r', provided that character matches 'n' or 'c'.

python -i
>>> import re
>>> x = 'funny cat'
>>> re.sub(r'[nc]','r',x)  
'furry rat'
>>> quit()

Make a variation of the program using re.sub. Then convince yourself that your new program gives exactly the same output as before, compare the old and new output files using the 'diff' unix command. e.g. diff <old output file> <new output file>. This command should give NO OUTPUT,

https://www.w3schools.com/python/python_regex.asp Take a few minutes to review the 'metacharacters' section. In the example above the '[' and ']' are metacharacters. Don't worry about why there is an 'r' in r'[nc]' (This is called a 'raw string'). Try the example with `'[nc]' instead -- any difference? I suspect no difference in this case. 'raw string' usage is somewhat complicated.

Regular expressions are powerful (both for searching and for replacing). There is a steep learning curve, but you don't need to know everything about regexes to use them.

funderburkjim commented 2 years ago

nabnormal

Just modify the given adjustlines function.

add a line before for line in lines: to initialize nabnormal to zero.
add an if statement (within the for loop) to update nabnormal, as discussed above.
add a print statement for 'nabnormal' before the 'return' statement.

funderburkjim commented 2 years ago

pip warning

The pip install message gave a 'WARNING ...' which suggests you to update pip. You can do this (with the command provided in the WARNING) if you want to. Usually it is not necessary to update pip.

As a general rule, WARNING messages in pip do NOT indicate that anything went wrong with the installation. If something did go wrong, you will see and 'error' message. For instance

$ pip install abracadabraxxx
ERROR: Could not find a version that satisfies the requirement abracadabraxxx
ERROR: No matching distribution found for abracadabraxxx
WARNING: You are using pip version 21.0.1; however, version 22.0.2 is available.
You should consider upgrading via the 'c:\users\jimfu\appdata\local\programs\python\python39\python.exe -m pip install --upgrade pip' command.

funderburkjim commented 2 years ago

installation ok

Your installation is ok. The funny looking \u1e5b ... is due to an oddity of the print function Python in conjunction with the Git Bash terminal.

This problem occurs with 'print(x)' when 'x' is a string containing non-ascii characters (the ascii characters are the 'usual' Latin alphabet including digits and punctuation ).

If you do the same test with the 'cmd' terminal of windows, you likely won't see those \u representations of unicode characters.

In my Windows installation of Git Bash, I have two things in my '.bashrc' configuration file. You can do this also. In my computer, the .bashrc file is at path c:/Users/jimfu/.bashrc Probably yours is similarly located, but at your Windows user name (Rybakova instead of jimfu). It is possible that this file does not exist; in that case just create one. It is a text file.

Put this line into the .bashrc file. alias python='winpty python.exe'

Save .bashrc, open a new GitBash terminal window and try the example again.
Does this solve the problem with the \u... ?

write to file

Note: When you write to a file opened as in the readwrite program, then you will NOT see this problem. You could make a simple test program:

# coding=utf-8
"""temp_translit.py
   USAGE: python temp_translit.py temp_translit.txt
   Tests indic_transliteration module
"""
from __future__ import print_function
import sys,re,codecs
from indic_transliteration import sanscript

if __name__=="__main__":
 fileout = sys.argv[1] # word frequency
 lines = []
 lines.append(sanscript.transliterate('fziH','slp1','iast'))
 lines.append(sanscript.transliterate('rAma','slp1','iast'))
 with codecs.open(fileout,"w","utf-8") as f:
  for line in lines:
   f.write(line+'\n')

Then run the program, and check the output file. Does the output look right?

sanskrit-lexicon / MD