scribe-org / Scribe-iOS

iOS app with keyboards for language learners
https://apps.apple.com/app/scribe-language-keyboards/id1596613886
GNU General Public License v3.0
126 stars 78 forks source link

Convert JSON data to SQLite #96

Closed andrewtavis closed 1 year ago

andrewtavis commented 2 years ago

Terms

Behavior

There’s currently a bug with v1.1.0 where the Russian keyboard will not translate words. The keyboard switches to English as it should, and words can be typed, but then on pressing return the keyboard crashes without anything being inserted.

Device type

iPhone 7

iOS version

iOS 15.1

andrewtavis commented 1 year ago

Very simple question as well, @wkyoshida and also @SaurabhJamadagni: does my idea of versioning for all this make sense? Scribe-iOS is going to v2.2.0 as what the user is experiencing is just some new features like the emoji suggestions, etc β€” i.e. nothing "breaking" per say. The functionality of Scribe-Data is being totally changed though to export something different with tons of breaking changes, hence that's going to v3.0.0.

andrewtavis commented 1 year ago

Oki doki 😊 Soooo, querying database values πŸ˜‰πŸ˜‰πŸ˜‰

Here are the functions I'm making and a simple example:

import GRDB

// get_iso_code is new and returns the two letter code for the language (ex: Deutsch -> de)

/// Makes a connection to the language database given the value for controllerLanguage.
func openDBQueue() -> DatabaseQueue {
  let dbName = "\(String(describing: get_iso_code(keyboardLanguage: controllerLanguage).uppercased()))LanguageData"
  let dbPath = Bundle.main.path(forResource: dbName, ofType: "sqlite")! // ex: dbName = DELanguageData
  let db = try! DatabaseQueue(
    path: dbPath
  )

  return db
}

/// Returns a value from the language database given a query and arguemtns.
///
/// - Parameters
///   - query: the query to run against the language database.
///   - args: arguments to pass to the query.
///   - outputCols: the columns from which the value should come.
func queryDB(query: String, args: [String], outputCols: [String]) -> [String] {
  var outputValues = [String]()
  do {
    try languageDB.read { db in
      if let row = try Row.fetchOne(db, sql: query, arguments: StatementArguments(args)) {
        for col in outputCols {
          outputValues.append(row[col])
        }
      }
    }
  } catch {}

  return outputValues
}

var languageDB = try! DatabaseQueue() // instantiated for assignment within CommandVariables.swift
languageDB = openDBQueue() // on firstKeyboardLoad == true within loadKeys()

// Example in another file:
let query = "SELECT * FROM emoji_keywords WHERE word = ?"
let args = ["gesicht"]
let outputCols = ["emoji_1"]

spaceBar = queryDB(query: query, args: args, outputCols: outputCols)[0]

Which then gives us πŸ₯πŸ₯πŸ₯πŸ₯:

Will do a PR soonish with the above a bit more organized 😊

andrewtavis commented 1 year ago

I think that this is a good way of going about this, but feedback would be welcome :) Specifically we'll need to keep track of the index of certain columns passed within outputCols to queryDB to assign strings from the query results. I'd say this is an ok trade off for being able to very simply get individual values and whole rows for things like autosuggestions as well as check for if no results were returned.

andrewtavis commented 1 year ago

Progress so far for me locally: all commands have been switched over 😊 So Translate, Conjugate and Plural are all directly from the SQLite databases, and I've actually removed the translations and verbs variables all together πŸš€ Suggestions based on pronouns has also been switched over as that was referencing verbs, and the emoji autosuggestions/completions are also SQLite based 😱😊😊

I'm going to work on switching over autosuggestions right now. A note is that the way we do autocompletions is the thing that will need to make the biggest shift, and will likely need a custom query (totally fine that it would πŸ™ƒ). We need to get rid of the autocompleteLexicon variable completely in my opinion. As discussed in #286, the creation of this variable in the firstKeyboardLoad step of laodKeys() was what was slowing down the initial load step, which is now solved as this logic has been moved out of the view controller. With that being said, there still is a slowdown of the UI for Russian in particular as the user types because we have this massive array on which the autocomplete checks are being made. Switching this to SQLite will doubtless speed this all up :) :)

SaurabhJamadagni commented 1 year ago

does my idea of versioning for all this make sense? Scribe-iOS is going to v2.2.0 as what the user is experiencing is just some new features like the emoji suggestions, etc

Hey @andrewtavis! Yeah that makes sense. With all the internal changes going on, Scribe-iOS version also seems like a big jump to me but I can see how the numbering would make sense from a user's perspective. Sounds great 😊

andrewtavis commented 1 year ago

Remaining issues after the commit I'm about to do 😊

andrewtavis commented 1 year ago

Alrighty then! 3be5a59 is massive 😱😊 It also includes two bug fixes that can be seen in the commit logs for the CHAGELOG :) There are also minor fixes throughout as this required me to go through a lot of the codebase.

What we've all been waiting for πŸ™ƒ:

I've been doing lots of testing and it all seems to be working great so far aside from one issue: for some reason I can't get it so that lower case versions of German nouns (that are capitalized) are not in the autocomplete lexicon... This was a problem a long time ago when we first added autocomplete/autosuggest. What I means by this is that there are two versions of most German nouns: Buch and buch, Auto and auto, etc. This is because we get lower case from the autosuggestion keys and capitalized from Wikidata. At this point in createAutocompleteLexicon I'm trying to check if the capitalized or upper case version of a word is also in the nouns, and if so to take the noun instance. I'm likely just doing a simple SQL mistake and can't see it right now πŸ€·β€β™‚οΈ We need to get it fixed on the SQL side, but it can also be implemented another way if someone has an idea.

Most important thing: I really think that this has worked 😊😊😊 We'll see, but before doing all this there was some serious lag still while using the Russian keyboard. In the initial tests since the changes it seems to be very responsive πŸ”₯πŸš€πŸ”₯βž‘οΈπŸŒ”πŸ˜…

If someone has suggestions for the nouns I'd be grateful! Glad this is coming to a close 😊

andrewtavis commented 1 year ago

https://github.com/scribe-org/Scribe-iOS/blob/3be5a592510763ce3f61ee21b3447579ca5e00b4/Keyboards/KeyboardsBase/LoadData.swift#L129

The above is the function for the autocomplete creation. Just to list it, this is the logic that I'm trying to put into the autocomplete lexicon:

We could also edit the drop duplicates function that's called after the UILexicon words (names from Contacts that are unordered) are added:

https://github.com/scribe-org/Scribe-iOS/blob/5a51b392ca8331e2d8c882747faad69be3154bab/Keyboards/KeyboardsBase/KeyboardViewController.swift#L1454

andrewtavis commented 1 year ago

An idea after a night’s sleep πŸ˜΄πŸ’‘: a self join to remove the lower case versions of capitalized/upper case words might make sense.

andrewtavis commented 1 year ago
  • Filter out words that are less than three characters, numbers and hyphenated words.
    • This step also needs to be looked at as the numbers aren't being removed πŸ€”

This has been fixed at this point :) Hyphenated words have been removed, and I'm actually keeping numbers as I think they help match the emoji suggestions for 10 on a clock, etc. We're selecting words that are longer than two characters for everything except emojis for this purpose. We can change it later based on live testing πŸ™ƒ

andrewtavis commented 1 year ago

af553b4 and d9e3cb0 fixed the last glaring issues in all this 😊 I'm going to test a bit after lunch, then will close and move on to #284 πŸš€ We can do minor patches to fix any issues that come up, but now everything seems to be working fine :)

andrewtavis commented 1 year ago

d9e3cb0 solution wasn't what we wanted, but 91b4791 fixes it :) As expected it was something pretty standard. Now what's happening in the selection step of the autocomplete lexicon query is we're joining on the nouns table as before, but now we're doing it twice β€” once when the lexicon word is the same as the noun when it's upper case and once when it's the same as the noun when it's capitalized. From there we do the same CASE WHEN checks we were doing before on the corresponding columns from nouns to only select upper case and capitalized versions if they exist 😊

andrewtavis commented 1 year ago

With 6add031 we removed the JSON data 😊 I've been playing around with it all and it seems to be ok. Sent along some commits today as well that fixed some minor bugs with the emoji implementations, but nothing major πŸ™ƒ

I think we're good to go! Closing this and switching over to #284!

andrewtavis commented 1 year ago

Thanks all for your support and ideas on all this! Major milestone for Scribe finished 😊 Here's hoping it'll work once we get it onto devices 🀞

SaurabhJamadagni commented 1 year ago

Hey @andrewtavis, sorry coming to this a bit late. I wish there was a way to star an issue because I am pretty sure I am going to be coming back here a lot. Loving the process documentation. I hate I was no help but thank you so much for all the changes! This is going to be awesome!! πŸš€

andrewtavis commented 1 year ago

There’s a little bit left for this to be pushed later on today, @SaurabhJamadagni, but I’ll send it along soon. There’s so much new data (as mentioned on Matrix πŸ˜‰) that I actually needed to do further optimizations to what we had before 😊 Will send it along with #284 later :)

andrewtavis commented 1 year ago

A final note on this issue is that the Russian keyboard's translation feature β€” the original issue that lead to all this β€” is now functional in v2.2.0 on device 😊😊