usbong / UsbongKit

A framework for reading and displaying contents of Usbong Trees
www.usbong.ph
Apache License 2.0
2 stars 1 forks source link

When a word is composed of two words joined by a dash, it does not become a word hint #21

Closed masarapmabuhay closed 7 years ago

masarapmabuhay commented 7 years ago

Salut! I noticed this seemingly minor issue, where a word composed of two words joined by a dash does not become a word hint.

In this case, it is the "pang-display" word.

screen shot 2016-11-28 at 5 19 24 pm

Merci beaucoup comme toujours!

chrisamanse commented 7 years ago

Thanks for reporting this issue. Can you confirm that there is a hint for the word "display" in the Filipino hints XML file (Filipino.xml)?

If yes, then PR #19 should fix this.

masarapmabuhay commented 7 years ago

Salut! Merci beaucoup pour ta réponse! The word "display" does not exist in our Filipino dictionary, i.e. Filipino.xml. The word "pang-display", however, does. This bug may actually be quite relevant in Filipino, since there are a number of words that use "-", e.g. "mag-aral". Merci toujours!

chrisamanse commented 7 years ago

Is it okay to get the utree file for your example?

masarapmabuhay commented 7 years ago

Salut! Oui, bien sûr! I emailed you the .utree privately. Merci beaucoup!

p.s. I thought about this bug again to see if we may actually be able to just make do with using your PR #19 fix, in that we have both "mag" and "aral" in our dictionary.

However, there are words such as "magpasyal-pasyal", which would require us to add "magpasyal" and "pasyal". Moreover, a word like "magbubukid" would require us to add the entire word.

Therefore, it would be more consistent to add entire words, e.g. "mag-aral", "magpasyal-pasyal", "magbubukid", in our dictionary.

Please let me know if you disagree. Merci bien!

chrisamanse commented 7 years ago

I found the bug here: https://github.com/usbong/UsbongKit/blob/master/UsbongKit/Views/HintsTextView.swift#L121

The condition for the parser is if the word already has a hint, then it will skip adding the hint. Thus, since "pang" already has a hint, then the hint of "pang-display" is skipped.

We could change this behavior such that it always overwrites the hint even when the word already contains a "hint". But this would lead to overwriting the latest value, such that if "pang" comes after "pang-display" in the hints XML, the hints in the text pang-display will now have two hints: "pang" hint for the pang text, and "pang-display" hint for the display text (<pang>-<display> vs <pang-display>, where <> denotes a hint).

However, the problem lies in my parser. Instead of going through each word in a text, it goes through every hint in the hint XML, and search if the word can be found in the text. The implementation should be for each word in the text, find the hint in the hint XML. With this implementation, the above cases will work appropriately, such that the hint in the text pang-display, will have the hint for "pang-display" only.

chrisamanse commented 7 years ago

Issue #22 provides a clearer explanation of my implementation and the intended implementation, written in pseudocode.

chrisamanse commented 7 years ago

@masarapmabuhay The new implementation can now be found in UsbongKit/Views/HintsTextView.swift#L90-L93

This means that, even when our hints has "mag", "aral", and "mag-aral", the hint for the text "mag-aral" will be the hint for "mag-aral", since it makes more sense to treat the definition of a word as a whole instead of by its pieces.

Feel free to merge this implementation by merging PR #23.

masarapmabuhay commented 7 years ago

Salut! Oui, je suis d'accord.

The implementation should be for each word in the text, find the hint in the hint XML.

I've merged your implementation, i.e. UsbongKit/Views/HintsTextView.swift#L90-L93

However, I found that your extension NSAttributeString {...} is significantly different from mine.

extension NSAttributedString {
    public func attributedStringWithHints(hintsDictionary: [String: String], withColor color: UIColor? = nil) -> NSAttributedString {
        let mutableAttributedText = NSMutableAttributedString(attributedString: self)
        let string = self.string as NSString
        let stringLength = string.length

        for (key, value) in hintsDictionary {
            var range = NSRange(location: 0, length: mutableAttributedText.length)
            let searchString = key
            let searchStringLength = searchString.characters.count

            while range.location != NSNotFound {
                range = string.rangeOfString(searchString, options: .CaseInsensitiveSearch, range: range)

                if range.location != NSNotFound {
                    // Allow words or phrases only - check for previous and next characters
                    var foundStringIsWord = true

                    // Previous/next character of word should be whitespace, newline or punctuation
                    let validCharacterSet = NSMutableCharacterSet.whitespaceAndNewlineCharacterSet()
                    validCharacterSet.formUnionWithCharacterSet(.punctuationCharacterSet())

                    if range.location > 0 {
                        let previousCharacterString = string.substringWithRange(NSRange(location: range.location - 1, length: 1))
                        if previousCharacterString.rangeOfCharacterFromSet(validCharacterSet) == nil {
                            foundStringIsWord = false
                        }
                    }

                    // Next character should be whitespace, newline or punctuation
                    let nextCharacterLocation = range.location + range.length
                    if foundStringIsWord && nextCharacterLocation < string.length {
                        let nextCharacterString = string.substringWithRange(NSRange(location: nextCharacterLocation, length: 1))
                        if nextCharacterString.rangeOfCharacterFromSet(validCharacterSet) == nil {
                            foundStringIsWord = false
                        }
                    }

                    if foundStringIsWord && mutableAttributedText.attribute(ContainsHintKey, atIndex: range.location, longestEffectiveRange: nil, inRange: range) == nil {
                        let targetRange = NSRange(location: range.location, length: searchStringLength)
                        mutableAttributedText.addAttributes([ContainsHintKey: value], range: targetRange)

                        if let targetColor = color {
                            mutableAttributedText.addAttribute(NSForegroundColorAttributeName, value: targetColor, range: targetRange)
                        }
                    }

                    let newLocation = range.location + range.length
                    range = NSRange(location: newLocation, length: stringLength - newLocation)
                }
            }
        }

        return mutableAttributedText
    }
}

Nevertheless, I replaced my version of the snippet with yours. This resulted in the error message:

Value of type 'NSString' has no member 'forEachWord'

masarapmabuhay commented 7 years ago

Pardon, as it turns out, I had to update the other files which you committed. I guess your new, more efficient implementation also cleaned up the code. Merci beaucoup!