ovalhub / pyicu

The PyICU project repository has moved to https://pyicu.org.
Other
133 stars 49 forks source link

[Feature Request] Add bidi algorithm #134

Closed SafaAlfulaij closed 4 years ago

SafaAlfulaij commented 4 years ago

https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/ubidi_8h.html It's massive, but maybe?

ovalhub commented 4 years ago

On Jul 22, 2020, at 22:24, Safa Alfulaij notifications@github.com wrote:

 https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/ubidi_8h.html It's massive, but maybe?

Indeed, but it could be done. If you had python wrappers around this API, how would you use it ? In particular, it would help me understand how this API is used. Could you please propose some python pseudo-code using this API that I could then reuse for writing tests or an example ?

Thanks ! Andi..

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

SafaAlfulaij commented 4 years ago

Sorry for the late reply.

I believe we can use something similar to what node-icu-bidi does.

layout = BidiLayout() # ubidi_open and init
layout.setPara("input text goes here") # ubidi_setPara, options at a later stage

level = layout.getParaLevel() # ubidi_getParaLevel
direction = layout.getDirection() # ubidi_getDirection

charLevels = []
for x in range(layout.getProcessedLength()): # ubidi_getProcessedLength
    charLevels.append( layout.getLevelAt(x) ) # ubidi_getLevelAt
# Or charLevels = layout.getProcessedArray()

runs = []
for x in range(layout.countRuns()): # ubidi_countRuns
    runs.append(layout.getVisualRun()) # ubidi_getVisualRun

This is the core functionality.

ovalhub commented 4 years ago

Great, thank you, this is useful !

On Jul 25, 2020, at 22:11, Safa Alfulaij notifications@github.com wrote:

 Sorry for the late reply.

I believe we can use something similar to what node-icu-bidi does.

layout = BidiLayout() # ubidi_open and init layout.setPara("input text goes here") # ubidi_setPara, options at a later stage

level = layout.getParaLevel() # ubidi_getParaLevel direction = layout.getDirection() # ubidi_getDirection

charLevels = [] for x in range(layout.getProcessedLength()): # ubidi_getProcessedLength charLevels.append( layout.getLevelAt(x) ) # ubidi_getLevelAt

Or charLevels = layout.getProcessedArray()

runs = [] for x in range(layout.countRuns()): # ubidi_countRuns runs.append(layout.getVisualRun()) # ubidi_getVisualRun This is the core functionality.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

ovalhub commented 4 years ago

I just checked in a first pass at this that covers the entire API except for the subclassing support. Here are the differences between what you proposed and what I ended up implementing:

layout = BidiLayout() # ubidi_open and init layout.setPara("input text goes here") # ubidi_setPara, options at a later stage

layout = Bidi() layout.setPara(UnicodeString("input text goes here")) // I made using UnicodeString a requirement here because the Bidi object // needs it for its entire lifespan and I didn't want to make repeated // copies of the para text. The layout Bidi instance maintains a reference // to it. You can access also it via layout.text. // Note that line = layout.setLine(start, limit) also has a text property // that shares the characters of layout's text. A line Bidi instance keeps // track of its parent and maintains a reference to it. You can access it // via line.parent.

level = layout.getParaLevel() # ubidi_getParaLevel direction = layout.getDirection() # ubidi_getDirection

// Unchanged.

charLevels = [] for x in range(layout.getProcessedLength()): # ubidi_getProcessedLength charLevels.append( layout.getLevelAt(x) ) # ubidi_getLevelAt

Or charLevels = layout.getProcessedArray()

levels = layout.getLevels() // levels is a tuple of 'processed length' level values. // You can also run the for loop but it's going to be vastly slower.

runs = [] for x in range(layout.countRuns()): # ubidi_countRuns runs.append(layout.getVisualRun()) # ubidi_getVisualRun

logicalStart, length, direction = layout.getVisualRun(x) // runs are returns as triples // You should be able to use layout.text[logicalStart:logicalStart+length] // to get the implied text.

It would be nice if you could propose some unit tests for this new code and the functionality covered by the API. There are a bunch of interesting functions in there. Please, take a look at the new bidi.cpp source file for all the API possibilities.

Thanks !

Andi..

SafaAlfulaij commented 4 years ago

You are awsome man! You are the C++-tar!

I will provide you with some unit tests once I take a good look on (and check) the API and the whole class.

Thanks a lot! You are truly awsome :)

ovalhub commented 4 years ago

On Tue, 28 Jul 2020, Safa Alfulaij wrote:

You are awsome man! You are the C++-tar!

I will provide you with some unit tests once I take a good look on (and check) the API and the whole class.

Thanks a lot! You are truly awsome :)

You're welcome ! I'm hoping you can propose some tests that use some "real" text that has several kinds of characters in them, not just plain placeholder ascii text. Thanks !

Andi..

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/ovalhub/pyicu/issues/134#issuecomment-665255716

SafaAlfulaij commented 4 years ago

I believe this is a good start (for now).

# ====================================================================
# Copyright (c) 2020 Open Source Applications Foundation.
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions: 
#
# The above copyright notice and this permission notice shall be included
# in all copies or substantial portions of the Software. 
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
# OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
# ====================================================================
#

import sys, os

from unittest import TestCase, main
from icu import *

input_text = "\u0643\u0627\u0646 \u0057\u0069\u006e\u0064\u006f\u0077\u0073 \u0031\u0030 \u0623\u0648\u0644 \u0646\u0638\u0627\u0645 \u062a\u0634\u063a\u064a\u0644 \u0628\u062a\u0642\u0646\u064a\u0629 \u0034\u004b\u002e"

input_text_2 = "\u0643\u0627\u0646\u062a \u004d\u0069\u0063\u0072\u006f\u0073\u006f\u0066\u0074 \u0057\u0069\u006e\u0064\u006f\u0077\u0073 \u0627\u0644\u0645\u0627\u0644\u0643 \u0627\u0644\u0623\u0648\u062d\u062f \u0644\u0639\u0644\u0627\u0645\u0629 \u0058\u0050\u002e"

class TestBidi(TestCase):

    def testDefault(self):

        Locale.setDefault(Locale.getUS())
        layout = Bidi()
        layout.setPara(UnicodeString(input_text))

        self.assertTrue( layout.getParaLevel() == 1 )
        self.assertTrue( layout.getDirection() == UBiDiDirection.MIXED )
        self.assertTrue( layout.getLevels() == (1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1) )
        self.assertTrue( layout.getVisualRun(0) == (39, 1, 1) )
        self.assertTrue( layout.getVisualRun(3) == (4, 10, 0) )

    def testForcedDirection(self):

        Locale.setDefault(Locale.getUS())
        layout = Bidi()
        layout.setPara(UnicodeString(input_text), 0)

        self.assertTrue( layout.getParaLevel() == 0 )
        self.assertTrue( layout.getDirection() == UBiDiDirection.MIXED )
        self.assertTrue( layout.getLevels() == (1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 0, 0) )
        self.assertTrue( layout.getVisualRun(0) == (0, 3, 1) )
        self.assertTrue( layout.getVisualRun(3) == (15, 22, 1) )

    def testLines(self):

        Locale.setDefault(Locale.getUS())
        layout = Bidi()
        layout.setPara(UnicodeString(input_text_2))

        line_layout = layout.setLine(0, 15)

        self.assertTrue( line_layout.getLevels() == (1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1) )
        self.assertTrue( line_layout.getVisualRun(0) == (14, 1, 1) )
        self.assertTrue( line_layout.getVisualRun(2) == (0, 5, 1) )

        line_layout_2 = layout.setLine(15, 47)

        self.assertTrue( line_layout_2.getLevels() == (2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1) )
        self.assertTrue( line_layout_2.getVisualRun(0) == (31, 1, 1) )
        self.assertTrue( line_layout_2.getVisualRun(2) == (7, 22, 1) )

if __name__ == "__main__":
    main()

And I tried using writeReverse but couldn't. Checking the code I need to pass a str argument that will be discarded?

I tried this after setPara at testLines:

        print(layout.writeReverse('', Bidi.DO_MIRRORING))

with no result...

ovalhub commented 4 years ago

On Thu, 30 Jul 2020, Safa Alfulaij wrote:

I believe this is a good start (for now).

... snip ...

Thank you, I integrated it into PyICU (not yet checked in).

And I tried using writeReverse but couldn't. Checking the code I need to pass a str argument that will be discarded?

writeReverse is a type method on the Bidi type, you can call it like that:

from icu import * Bidi.writeReverse("this is text") <UnicodeString: 'txet si siht'>

As a class method it doesn't need an instance or a para. It takes any kind of string, UnicodeString, python string, whatever that can be converted into a UnicodeString (like most PyICU wrappers taking a string)

I tried this after setPara at testLines:

       print(layout.writeReverse('', Bidi.DO_MIRRORING))

with no result...

The inverse of an empty string might be just that.

Andi..

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/ovalhub/pyicu/issues/134#issuecomment-666642757

ovalhub commented 4 years ago

I now added and checked in your unit tests for the new Bidi wrappers. Thank you for the contribution !

On Fri, 31 Jul 2020, Safa Alfulaij wrote:

Closed #134.

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/ovalhub/pyicu/issues/134#event-3607614869

ovalhub commented 4 years ago

On Fri, 31 Jul 2020, Safa Alfulaij wrote:

Closed #134.

I also now added wrappers for the UBiDiTransform API.

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/ovalhub/pyicu/issues/134#event-3607614869