Closed andrewtavis closed 1 year ago
Is not reproducible on an iPhone 7 in Simulator, but then the version is 15.2.
Other translation features work fine, so shouldn't be an issue with accessing the data as the size is relatively similar (although Russian has the largest translation file by 0.6MB).
I may take a look on this issue
This would be absolutely amazing! ๐๐ I really haven't had the bandwidth to dedicate time to this, but it's the main bug right now that dramatically affects the functionality. Let me know if you have ideas on this!
Yep, assign me on this please and I'll let you know on my progress with this issue
Assigned! Thank you ๐
@japanese-goblinn, reporting that apparently v2.0 Made this even worse, as basically the Russian keyboard crashes from the start at this point ๐คฆโโ๏ธ This is not happening on the emulator, and sadly we donโt have device level testing happening at this point. The crashes that other keyboards were experiencing have actually gotten better.
Iโm wondering if this is maybe an issue with the amount of data that Russianโs ingesting? All the other keyboards have similar amounts of data, but Russian has those almost 200k bound. Could be crashing because it canโt handle it?
@andrewtavis I've looked intro this crash and will soon describe here what I've found
Thanks so much, @japanese-goblinn! Really looking forward to what youโve figured out :)
@japanese-goblinn, note that I just checked and the new "lack of" behavior is reproducible sometimes on the Simulator, I just didn't check Russian after all these changes. Am looking into this as well now :)
Well, I managed to get this error while debugging, but not really sure how to reproduce it (screenshots below). I want to take a deeper look into this bug but unfortunately I don't have time for now.
This is what I found so far:
You sad that file is about 0.6 MB
when it's actually 15.1 MB
. Here's code how I measured it
// func loadJSONToDict
let data = try Data(contentsOf: url)
let f = ByteCountFormatter()
f.allowedUnits = [.useMB]
f.countStyle = .memory
print("[DEBUG]: file \(fileName) size \(f.string(fromByteCount: Int64(data.count)))")
It's not much but it's a measurement of raw data, so it's less that Dictionary
will occupy (not sure how bad it is, you need to figure it out yourself).
I also tried to use profiler but that was not really helpful, it's only showed that a really slow disc read is happening but you can see it yourself - keyboard is taking 2-3 seconds to startup (at least in dev mode).
As a conclusion I'm not sure if slow app startup is causing sometimes to crash, because you can't be absolutely sure if system will kill your application due to slow start. What I'm absolutely sure about is that you need to rework your data structures and whole approach of working with data (read only batches not whole file, using DB, etc.) and I can sure your that application will become event better and hopefully you'll manage to fix this bug!
I hope you luck in this hard task and thank you for possibility to contribute!
@japanese-goblinn, wanted to thank you for your contributions! #231 is an initial step towards working on this in my opinion, as we really should be leveraging a library for JSON loading. Hopefully this will make the read only batches you suggested or another solution for the data process easier going forward ๐
All the best, and your contributions are welcome whenever you have time! :)
We'll likely need to be using Core Data for this in the future, but a WIP step to help the situation would be to remove the indentation from all JSON files to make them smaller in size.
@SaurabhJamadagni, FYI for this issue, I have the JSON files formatted to remove spaces, which really does help as far as app size is concerned. Will push those with the QWERTY French keyboard for #229 :)
It sadly doesnโt seem to help to much with load times, as itโs likely the crazy number of lookups being done thatโs problematic. Iโm going to a Berlin Swift coding group on Thursday and will discuss with people how best to implement Core Data or another solution that will help us now directly load all these dictionaries into memory all at once ๐
It sadly doesnโt seem to help to much with load times, as itโs likely the crazy number of lookups being done thatโs problematic. Iโm going to a Berlin Swift coding group on Thursday and will discuss with people how best to implement Core Data or another solution that will help us now directly load all these dictionaries into memory all at once
That sounds good @andrewtavis! This should be a huge performance boost in that case. I'll try to see if I can find any answers as well ๐
Iโve already been watching some videos on all this after you suggested it, @SaurabhJamadagni :) Doesnโt seem like itโd be too much of a jump as itโs so well integrated into normal Swift workflows. Letโs be in touch!
A recap of what came from the recent iOS meetup I attended in relation to this issue, @SaurabhJamadagni:
The suggestion from everyone I talked to was actually not to go the route of Core Data, but rather to add an SQLite instance to the app. Generally the sentiment was that Core Data is an easy Apple backed solution, but will cause problems later on. Looking at it, it seems that we could make use of the following options:
Here are some tutorials as well:
The suggestion from everyone I talked to was actually not to go the route of Core Data, but rather to add an SQLite instance to the app.
That works too @andrewtavis. I was also interested in the alternatives as I used CoreData once and although powerful, it's showing a little age in my opinion. I could be wrong as there's so much more I didn't use the one time I gave it a shot. I'll go through the resources you have shared asap. Thanks for sharing these! The meetup seems to have had a very productive output I assume ๐
Also I am back, so should get to the French PR soon :)
Looking forward to the PR and learning about all this, @SaurabhJamadagni! The meetups are quite fun, yes :) Really nice people go to the Berlin ones ๐
Further feedback on from a coder from the iOS meetup here in Berlin is that the best option would be to go with SQLite.swift. I still haven't had the time to go through the tutorials for that, but will start looking into them when we're done with #208 and #248 and v2.1.0 is released ๐
@SaurabhJamadagni, I added a roadmap to the readme contributors section - just FYI. Would be great to get your feedback on that! I've been having an extended discussion about it in https://github.com/scribe-org/Scribe-Data/issues/14 (summary of that: using the Python emoji package doesn't make sense, we're going directly from Unicode sources for the words we want emoji suggestions for, and the goal is to get JSONs where we have word-Emoji pairs using the Unicode annotation JSON files).
Again, your feedback on the roadmap would be very very welcome ๐
I added a roadmap to the readme contributors section
The roadmap looks great @andrewtavis. I was wondering if adding the English keyboard should be done sooner, but it is best to keep it for a major version update. Although, is there a different place where we can move the bullet points for Scribe-Android and Scribe-Data instead of inside the Scribe-iOS repo? Just nitpicking, else the points look golden! ๐
I've been having an extended discussion
I tried to read through it. Can't say I understood everything ๐ but I am assuming there are interoperability issues between going from Unicode to json for emojis? Maybe we could discuss this during our call.
Although, is there a different place where we can move the bullet points for Scribe-Android and Scribe-Data instead of inside the Scribe-iOS repo?
The general idea now is to start using the new version of GitHub projects and just link to them that way :) Makes sense to me that it's the best way of organizing everything going forward, and that way we'd just link to the projects in order of their importance.
As far as a call, what day of the week would you be available? Wednesday or Thursday evening? I'm thinking I'm gonna get some designs for the app and #16 in tonight, so we can go through those a bit as well ๐
As far as a call, what day of the week would you be available? Wednesday or Thursday evening?
Thursday works @andrewtavis. What time should we have the call? Would 17:00 UTC do?
17 UTC works, @SaurabhJamadagni ๐ Sending along an invite :)
@austinate, @SaurabhJamadagni and I have been discussing how to implement SQLite.swift, and had a few questions ๐ Where we're at is:
Changed the name of this issue so that it's more reflective of the current scope ๐
As stated in #286, a lot of what was problematic in here was coming from the way the autosuggestions lexicon was being made :) All keyboards are loading much faster at this point except Russian โ the original problematic one because of its 194k noun dataset โ which is still loading slowly. On an emulator it is loading though, which before #286 was not the case.
This issue and #284 are now the last issues before we can finally do the v2.2.0 release with so many new additions ๐๐
This issue and https://github.com/scribe-org/Scribe-iOS/issues/284 are now the last issues before we can finally do the v2.2.0
AWESOME! ๐
What's the plan on this one @andrewtavis? How are we going to proceed?
I'm gonna start looking at the documentation for SQLite.swift GRDB.swift and then try to wrap my head around it all, @SaurabhJamadagni. Really would be good to get this done as there's so much that could go out at this point ๐
I got into a bit of a groove over the last day and a half ๐๐
@SaurabhJamadagni, if you also want to look into SQLite.swift GRDB.swift that would be really great ๐๐ Then each of us can put any ideas that we get into this issue and we can then make a plan to switch it all over. There are tons of places where data is being referenced, so I think the big thing here is to maybe make the tables and load the data into them first, and then switch over the app table by table while checking that everything works (so first nouns, then verbs, etc).
Let me know how this sounds :) :)
@SaurabhJamadagni, I edited the above links and changed them to GRDB.swift as I think I'd like to go with that instead of SQLite.swift. Reasons for this are that the original maintainer is still working on GRDB.swift
, these two Reddit pages (link, link), its been benchmarked against other solutions and won (maybe biased, but at least they did benchmark it), the issues are really well maintained (like I thought oh it's not used so much cuz there are no issues and then I was like ๐ฎ), and I generally really like the care that they put into all the docs that are right in the readme (I've already edited our readme a bit to borrow some of the stuff they do ๐).
Hope this sounds good to you! :)
Notes on this (edited):
GRDB
there aren't many videos as Core Data
seems to get the most love, but this one (ignore the pod installation) and the corresponding article were helpful.json
file exports within Scribe-Data
.sqlite
file that we can then reference via GRDB
instead of a .json
with SwiftyJSON
@wkyoshida, can I ask you to weigh in on this from the Scribe-Data side/your experience? Also given our discussion in https://github.com/scribe-org/Scribe-Data/issues/26, it'd be great if you'd share your opinions :)
A quick rundown for you:
Questions I'm thinking about now:
.sqlite
file per language so that we can eventually download just the ones we need, correct?.json
files that we have here in Scribe-Data (update_scribe_apps=false
). After that we can write a Python file like "data_to_sqlite.py"
that reads through the finalized .json
files in a directory and puts their data into tables in an .sqlite
file that's then moved over to Scribe-iOS as the .json
files have been till now?Your feedback and any general ideas would be very appreciated! ๐
Update on this :) sqlite3 and Python's json
seem to be doing the trick ๐ I'm slowly but steadily creating a file in Scribe-Data that parses through JSONs and creates a unified language .sqlite
database. Note that SQLite Viewer is really useful for checking the contents of the databases from within VS Code :)
Last thing is figuring out how to auto-generate the column names for verb tables based on what's in the JSON data, which should be pretty standard. Maybe will need minor changes in iOS if the naming conventions aren't consistent ๐
Thing to do:
update_scribe_apps
conditions and change process to put all data into a single formatted_data
directory for each language within scribe_data/extract_transformupdate_data.py
and other Scribe-Data filesData
directories for nowData
directories to using the new databases via GRDB.swift
Data
directories from keyboard extension directories within Scribe-iOS Edit: rest of issues moved to a lower comment :)
077e346 adds the SQLite databases ๐๐๐๐๐ This SQLite stuff is great :) Wish I'd done this from the start! I'll work on Scribe-Data a bit more today and commit my changes later after I'm done switching over the process to not update the Data
directories in Scribe-iOS, but rather local formatted_data
directories in Scribe-Data.
We can now start experimenting with GRDB.swift and making references to the SQLite databases, which hopefully will go well ๐ค
Note that for the databases, if there is an entry that doesn't exist the NAN
value is an empty string (""
). This means that some Scribe-iOS features like autosuggestions and emojis will need to be changed, as I think as of now they check to see if a value exists or not rather than if it's ""
.
@SaurabhJamadagni, if you also want to look into SQLite.swift GRDB.swift that would be really great
Hey sorry @andrewtavis. Was a bit occupied with my college semester project. I'll read up too absolutely. I am sorry for the delay. You got so much done though that's insane! Have you pushed it to a remote branch that I can take a look at too?
Hey @SaurabhJamadagni! No stress or delay at all :) As I said I just got into a bit of a flow on all this and went at it ๐ The databases are already in the main branch in each of the Keyboards/LanguageKeyboards directories, and the Scribe-Data stuff will be in there later :) Feel free to read up a bit and start testing out accessing the databases ๐
Taking a look at the sqllite files for the languages @andrewtavis. The data isn't referenced anywhere yet right? Looks so much more organized compared to the JSONs haha. So, you wrote the function that converts JSONs to sqlite? Is it in Scribe-Data?
So if I am understanding the transition correctly, instead of loading the arrays from JSON files we will query the database instead. But the arrays will still be formed? Or will we be performing smaller but specific queries wherever necessary and skip on creating the larger arrays completely?
For example, will data be loaded in the command variables like nouns or are we directly querying them inside the code where required?
@SaurabhJamadagni, the data isnโt referenced anywhere yet :) We can each try to see how using it goes before deciding on how exactly we want to use it. And yes the functions to convert the JSONs over are in my local Scribe-Data ๐ Will push them later.
Or will we be performing smaller but specific queries wherever necessary and skip on creating the larger arrays completely?
Itโs a good question! I think if we can do specific queries that would be best, but letโs decide on this later ๐
Will push them later.
No rush @andrewtavis, was just curious. Sounds good :)
and remove SwiftyJSON from dependencies
Noting that this should not be done as odds are we'll be using JSON still for the app texts ๐ค We'll be adopting a localization platform in #268, which likely will be translating the texts housed in JSONs :) I'm not sure if there's any easy way to do this with another file type, so for now it's fine to keep the JSONs for this purpose ๐
A reaction to this comment makes me feel even better about our choice of SQLite dependency ๐๐
https://github.com/scribe-org/Scribe-Data/commit/d830ae527640a3b23983db187c8509efc65e31a4 committed the local work I had been doing to create the SQLite databases for the language keyboards. There is a lot of code in there as there was so much refactoring... I updated the Scribe-Data changelog with the things that I did, but specifically the files of interest are:
formatted_data
directories within each language in scribe_data/extract_transform and converts them into a single SQLite database
There are still a few more things to be done and I'll move some stuff around now that some files are more used for extract-transform purposes rather than for loading. The final tests/fixes of it all will be when I redo the data process for #284 ๐
e1069fb updates the data based on the recently closed https://github.com/scribe-org/Scribe-Data/issues/30 :) I was realizing that when I typed a plural noun there sometimes wasn't an emoji suggestion when for the singular version there was. Ex: I type "Krokodil" and I see the crocodile emoji, but typing "Krokodile" showed nothing, which I found weird. https://github.com/scribe-org/Scribe-Data/issues/30 looked for cases where a plural noun was not a key in the emoji keyword data, and when not the emojis from its singular form were used.
As the comment with the to do list shows, I'm mostly done with the Scribe-Data work except for a final test during #284 ๐ฅ
We're ready to start switching over Scribe-iOS to the new databases ๐๐ My thoughts on this would be trying to do direct queries where we can. We'd have strings like prefix
for autosuggestions/completions or wordToTranslate
and we can then try to do a direct query of the database for the information we want. If the data doesn't exist then we handle it by passing in case of auto actions or displaying an error in case of commands :)
I'd say it makes sense to start with prepositions as they're only for German and Russian. As soon as that's working we can start with some of the others, with translations being the next easiest as the data just has two columns. Translations from languages that aren't English are gonna be so much easier now that we can just make a database for it all ๐
Hey all! I'm catching up here on all the new work.. @andrewtavis was definitely on a roll :laughing:
Some thoughts:
On translations and potentially being able to do them from and to any language eventually, I'm wondering what's a good way to organize them as a .sqlite
. They are currently stored as word
-translation
(i.e. english-to-target language pairs) within each language's .sqlite
. Instead, could putting all the languages together make sense? i.e. A separate .sqlite
with a table with rows, english
, german
, french
, and so on? At some point perhaps, once the apps are able to selectively download specific languages, this table could dynamically get populated with only the user-selected languages.
On emojis - I need to browse around to remember where we landed on how to handle emoji variations (e.g. different skin tones). However, an idea could be to save only the base version of the emoji as a suggestion, so that the desired variation is then used dynamically. Now, how to determine which emojis are base emojis? One idea could be to have a separate .sqlite
with a table with emoji metadata. Some rows could be id
, emoji
, is_base
. Then, in the emoji_keywords
tables of each language, instead of having the actual emojis, the tables could have the id
. With the id
, the app can then retrieve the corresponding emoji and whether it should treat it as a base emoji or not.
emoji_keywords table in DELanguageData.sqlite |
word |
emoji_1 |
emoji_2 |
emoji_3 |
---|---|---|---|---|
gesicht | 34 | 66 | 71 |
emojis table in emojis.sqlite |
id |
emoji |
is_base |
---|---|---|---|
34 | ๐ | false | |
66 | ๐คฃ | false | |
71 | ๐ญ | false |
On the future feature to download the data in-app, I am understanding that send_dbs_to_scribe.py is more for convenience today and that eventually it gets replaced by the future download feature. Something to consider is what file format to use for the download feature. Would the apps just be downloading the .sqlite
files? Or as .json
s or something else? Based on the question below from @andrewtavis, I would guess .sqlite
? Wanted to still ask though, in case there could be a reason to use something else. To answer the question as well, I think a .sqlite
file per language works too (with perhaps the exception on how to handle translations as broached above).
Based on https://github.com/scribe-org/Scribe-Data/issues/26 we should be making one .sqlite file per language so that we can eventually download just the ones we need, correct?
On Scribe-Data as a package - so data_to_sqlite.py is doing the JSON-to-SQLite conversion today. One thing that I'm wondering - do we want to make the .sqlite
generation a feature/option that is built-in to Scribe-Data? i.e. In thinking of Scribe-Data as a general-use package, would it make sense to have that as an option (i.e. the user can choose to either generate as .json
or .sqlite
)? If maybe not, then where should the JSON-to-SQLite conversion happen? On the app-side? Or, referencing the discussions on https://github.com/scribe-org/Scribe-Data/issues/26, on the Scribe-Server side?
Final thoughts - ideally, I think a different approach on how to structure this data process would've been better, as touched on in the last two points. However, due to limitations such as not having the data download feature or Scribe-Server in place, I agree with the approach with send_dbs_to_scribe.py and data_to_sqlite.py. It perhaps makes the most sense with what we can today and with how pressing moving to SQLite is. Really awesome you got it done so quick! :raised_hands:
Hey @wkyoshida ๐ Thanks for the feedback and the thoughts! ๐
They are currently stored as word-translation (i.e. english-to-target language pairs) within each language's .sqlite. Instead, could putting all the languages together make sense? i.e. A separate .sqlite with a table with rows, english, german, french, and so on?
I'd say this would be once Wikidata gets to a point where we can get translations from there, right? That's when we can actually just get a single table. As of now we'll just be downloading as many words as we can and machine translating them, so I think source language based .sqlite
files make sense where all the words we can get are primary keys, and all the translations from Hugging Face are the various columns via https://github.com/scribe-org/Scribe-Data/issues/23 as you mentioned.
One idea could be to have a separate .sqlite with a table with emoji metadata. Some rows could be id, emoji, is_base. Then, in the emoji_keywords tables of each language, instead of having the actual emojis, the tables could have the id.
This makes sense and seems to be where we'd go for #271 :)
Would the apps just be downloading the .sqlite files? Or as .jsons or something else? Based on the question below from @andrewtavis, I would guess .sqlite?
I'd also say .sqlite
for now, but we can discuss this a bit later as you and I had also been discussing last_updated
as a part of this so that the user's data update process could be much faster. Selective updating would also likely be easiest done via .sqlite
, or would you suggest something else?
i.e. In thinking of Scribe-Data as a general-use package, would it make sense to have that as an option (i.e. the user can choose to either generate as .json or .sqlite)? If maybe not, then where should the JSON-to-SQLite conversion happen? On the app-side?
I'd say definitely as we extrapolate out we can keep the option for JSONs and .sqlite
file exports ๐ Why not, if it's not too hard to maintain, as by the looks of it we're doing .sqlite
for the app and Wikidata exports are JSONs or other formats that we wouldn't necessarily use in app, so data prep can happen while we're still in JSONs and eventually that or other file types can be exported? Scribe-Server side for sure though :) Apps should either be just getting a whole new updated .sqlite
files from this point on, or update certain rows of it based on last_updated
, but let me know if this sentence makes sense ๐
The data_to_sqlite.py works as an interim solution, but eventually, we can move that logic to Scribe-Server.
๐๐๐๐
Really awesome you got it done so quick! ๐
Thank you!! ๐๐๐ Now to get this here app updated and shipped ๐๐
Hey! :wave:
I'd say this would be once Wikidata gets to a point where we can get translations from there, right?
Yeah, I'm thinking this could wait until we're able to do cross-translations - in whatever way this is accomplished in https://github.com/scribe-org/Scribe-Data/issues/23. I was more so pointing out perhaps that the current translations
tables with the word
-translation
rows do limit us to only the English-to-target language translation, which is fine for now of course, since that is the only translation data that we have.
Selective updating would also likely be easiest done via
.sqlite
, or would you suggest something else?
This is what I want to make sure to think through. I think there will be three steps that are more time-heavy or resource-heavy that happen when a data download is done:
For these steps, the file format should facilitate an efficient download workflow. FWIW I am currently thinking .sqlite
though, because, for the aforementioned three steps:
last_updated
I believe will be faster if Scribe-Server is already storing the data in a DB .json
, this would mean having to send multiple .json
per language. On the other hand, there is one .sqlite
per language.
.sqlite
.json
, the client would have to retain the ability to read the data as .json
to then write it to .sqlite
I'd say definitely as we extrapolate out we can keep the option for JSONs and .sqlite file exports blush Why not, if it's not too hard to maintain
I guess the reservation that I'm having is more so coming from putting myself in the perspective of a user of Scribe-Data (as a general-use package). Like, would I care enough about the .sqlite
option? Would I use it? If I don't, would I be bothered with the extra bloat of SQLite functionality that comes packaged in?
Maybe it's fine actually. I'm not definitely opposed. I'm perhaps more trying to think from the mindset that, as a user of packages, I would like my packages to do what I need them to, be small and compact, and have a clear use/purpose - if that makes sense.
Apps should either be just getting a whole new updated
.sqlite
files from this point on, or update certain rows of it based onlast_updated
, but let me know if this sentence makes sense :pray:
I'm with you! :+1:
I was more so pointing out perhaps that the current translations tables with the word-translation rows do limit us to only the English-to-target language translation, which is fine for now of course, since that is the only translation data that we have.
I'm definitely of the opinion that as soon as more options are available for translation that instead of translation
as a column we'd have a language name which is what the word would be translated into. I think we're on the same page ๐ What'll need to happen is that the relationship between translations will need to be inverted based on what we have right now. Right now we have within the German files a file that links English to German, but what will be there instead will be a file where word
is German words and each of the columns is what we've gotten as translations. Right now we are saving translation files with their target, but in the future it'd be those words that are typed on the keyboard that's switched to for during translation :)
Glad to hear that it sounds like .sqlite
can be a solid foundation for the future! Let's keep discussing and readjusting, but for now we're sailing ahead โตโก๏ธ
Maybe it's fine actually. I'm not definitely opposed. I'm perhaps more trying to think from the mindset that, as a user of packages, I would like my packages to do what I need them to, be small and compact, and have a clear use/purpose - if that makes sense.
I think it also depends on who the people are that first want to use it for non-Scribe purposes, but definitely happy to remove the bloat and even have a lot of the multi-use stuff be in Server for a wider community where Scribe-Data is more just what we need :)
I'm with you! ๐
๐๐๐๐
Terms
Behavior
Thereโs currently a bug with v1.1.0 where the Russian keyboard will not translate words. The keyboard switches to English as it should, and words can be typed, but then on pressing return the keyboard crashes without anything being inserted.
Device type
iPhone 7
iOS version
iOS 15.1