Convert JSON data to SQLite

andrewtavis commented 2 years ago

Terms

[X] I have searched all open bug reports
[X] I agree to follow Scribe-iOS' Code of Conduct

Behavior

There’s currently a bug with v1.1.0 where the Russian keyboard will not translate words. The keyboard switches to English as it should, and words can be typed, but then on pressing return the keyboard crashes without anything being inserted.

Device type

iPhone 7

iOS version

iOS 15.1

andrewtavis commented 2 years ago

Is not reproducible on an iPhone 7 in Simulator, but then the version is 15.2.

Other translation features work fine, so shouldn't be an issue with accessing the data as the size is relatively similar (although Russian has the largest translation file by 0.6MB).

japanese-goblinn commented 2 years ago

I may take a look on this issue

andrewtavis commented 2 years ago

This would be absolutely amazing! 🚀🚀 I really haven't had the bandwidth to dedicate time to this, but it's the main bug right now that dramatically affects the functionality. Let me know if you have ideas on this!

japanese-goblinn commented 2 years ago

Yep, assign me on this please and I'll let you know on my progress with this issue

andrewtavis commented 2 years ago

Assigned! Thank you 😊

andrewtavis commented 2 years ago

@japanese-goblinn, reporting that apparently v2.0 Made this even worse, as basically the Russian keyboard crashes from the start at this point 🤦‍♂️ This is not happening on the emulator, and sadly we don’t have device level testing happening at this point. The crashes that other keyboards were experiencing have actually gotten better.

I’m wondering if this is maybe an issue with the amount of data that Russian’s ingesting? All the other keyboards have similar amounts of data, but Russian has those almost 200k bound. Could be crashing because it can’t handle it?

japanese-goblinn commented 2 years ago

@andrewtavis I've looked intro this crash and will soon describe here what I've found

andrewtavis commented 2 years ago

Thanks so much, @japanese-goblinn! Really looking forward to what you’ve figured out :)

andrewtavis commented 2 years ago

@japanese-goblinn, note that I just checked and the new "lack of" behavior is reproducible sometimes on the Simulator, I just didn't check Russian after all these changes. Am looking into this as well now :)

japanese-goblinn commented 2 years ago

Well, I managed to get this error while debugging, but not really sure how to reproduce it (screenshots below). I want to take a deeper look into this bug but unfortunately I don't have time for now.

This is what I found so far:

You sad that file is about 0.6 MB when it's actually 15.1 MB. Here's code how I measured it

// func loadJSONToDict
let data = try Data(contentsOf: url)
let f = ByteCountFormatter()
f.allowedUnits = [.useMB]
f.countStyle = .memory  
print("[DEBUG]: file \(fileName) size \(f.string(fromByteCount: Int64(data.count)))")

It's not much but it's a measurement of raw data, so it's less that Dictionary will occupy (not sure how bad it is, you need to figure it out yourself).

I also tried to use profiler but that was not really helpful, it's only showed that a really slow disc read is happening but you can see it yourself - keyboard is taking 2-3 seconds to startup (at least in dev mode).

As a conclusion I'm not sure if slow app startup is causing sometimes to crash, because you can't be absolutely sure if system will kill your application due to slow start. What I'm absolutely sure about is that you need to rework your data structures and whole approach of working with data (read only batches not whole file, using DB, etc.) and I can sure your that application will become event better and hopefully you'll manage to fix this bug!

I hope you luck in this hard task and thank you for possibility to contribute!

IMAGE 2022-10-13 20:14:23 IMAGE 2022-10-13 20:14:26

andrewtavis commented 2 years ago

@japanese-goblinn, wanted to thank you for your contributions! #231 is an initial step towards working on this in my opinion, as we really should be leveraging a library for JSON loading. Hopefully this will make the read only batches you suggested or another solution for the data process easier going forward 😊

All the best, and your contributions are welcome whenever you have time! :)

andrewtavis commented 2 years ago

We'll likely need to be using Core Data for this in the future, but a WIP step to help the situation would be to remove the indentation from all JSON files to make them smaller in size.

andrewtavis commented 2 years ago

@SaurabhJamadagni, FYI for this issue, I have the JSON files formatted to remove spaces, which really does help as far as app size is concerned. Will push those with the QWERTY French keyboard for #229 :)

It sadly doesn’t seem to help to much with load times, as it’s likely the crazy number of lookups being done that’s problematic. I’m going to a Berlin Swift coding group on Thursday and will discuss with people how best to implement Core Data or another solution that will help us now directly load all these dictionaries into memory all at once 😊

SaurabhJamadagni commented 2 years ago

It sadly doesn’t seem to help to much with load times, as it’s likely the crazy number of lookups being done that’s problematic. I’m going to a Berlin Swift coding group on Thursday and will discuss with people how best to implement Core Data or another solution that will help us now directly load all these dictionaries into memory all at once

That sounds good @andrewtavis! This should be a huge performance boost in that case. I'll try to see if I can find any answers as well 😊

andrewtavis commented 2 years ago

I’ve already been watching some videos on all this after you suggested it, @SaurabhJamadagni :) Doesn’t seem like it’d be too much of a jump as it’s so well integrated into normal Swift workflows. Let’s be in touch!

andrewtavis commented 1 year ago

A recap of what came from the recent iOS meetup I attended in relation to this issue, @SaurabhJamadagni:

The suggestion from everyone I talked to was actually not to go the route of Core Data, but rather to add an SQLite instance to the app. Generally the sentiment was that Core Data is an easy Apple backed solution, but will cause problems later on. Looking at it, it seems that we could make use of the following options:

Here are some tutorials as well:

SaurabhJamadagni commented 1 year ago

The suggestion from everyone I talked to was actually not to go the route of Core Data, but rather to add an SQLite instance to the app.

That works too @andrewtavis. I was also interested in the alternatives as I used CoreData once and although powerful, it's showing a little age in my opinion. I could be wrong as there's so much more I didn't use the one time I gave it a shot. I'll go through the resources you have shared asap. Thanks for sharing these! The meetup seems to have had a very productive output I assume 😊

Also I am back, so should get to the French PR soon :)

andrewtavis commented 1 year ago

Looking forward to the PR and learning about all this, @SaurabhJamadagni! The meetups are quite fun, yes :) Really nice people go to the Berlin ones 😊

andrewtavis commented 1 year ago

Further feedback on from a coder from the iOS meetup here in Berlin is that the best option would be to go with SQLite.swift. I still haven't had the time to go through the tutorials for that, but will start looking into them when we're done with #208 and #248 and v2.1.0 is released 😊

andrewtavis commented 1 year ago

@SaurabhJamadagni, I added a roadmap to the readme contributors section - just FYI. Would be great to get your feedback on that! I've been having an extended discussion about it in https://github.com/scribe-org/Scribe-Data/issues/14 (summary of that: using the Python emoji package doesn't make sense, we're going directly from Unicode sources for the words we want emoji suggestions for, and the goal is to get JSONs where we have word-Emoji pairs using the Unicode annotation JSON files).

Again, your feedback on the roadmap would be very very welcome 😊

SaurabhJamadagni commented 1 year ago

I added a roadmap to the readme contributors section

The roadmap looks great @andrewtavis. I was wondering if adding the English keyboard should be done sooner, but it is best to keep it for a major version update. Although, is there a different place where we can move the bullet points for Scribe-Android and Scribe-Data instead of inside the Scribe-iOS repo? Just nitpicking, else the points look golden! 🚀

I've been having an extended discussion

I tried to read through it. Can't say I understood everything 😅 but I am assuming there are interoperability issues between going from Unicode to json for emojis? Maybe we could discuss this during our call.

andrewtavis commented 1 year ago

Although, is there a different place where we can move the bullet points for Scribe-Android and Scribe-Data instead of inside the Scribe-iOS repo?

The general idea now is to start using the new version of GitHub projects and just link to them that way :) Makes sense to me that it's the best way of organizing everything going forward, and that way we'd just link to the projects in order of their importance.

As far as a call, what day of the week would you be available? Wednesday or Thursday evening? I'm thinking I'm gonna get some designs for the app and #16 in tonight, so we can go through those a bit as well 😊

SaurabhJamadagni commented 1 year ago

As far as a call, what day of the week would you be available? Wednesday or Thursday evening?

Thursday works @andrewtavis. What time should we have the call? Would 17:00 UTC do?

andrewtavis commented 1 year ago

17 UTC works, @SaurabhJamadagni 😊 Sending along an invite :)

andrewtavis commented 1 year ago

@austinate, @SaurabhJamadagni and I have been discussing how to implement SQLite.swift, and had a few questions 🙂 Where we're at is:

We're going to write functions to create tables for each type of word (nouns, verbs, etc)
- The function would delete the current version of the table and repopulate it from the JSON so that the tables are automatically overwritten if the user switches to a different keyboard
- We'd also create a temporary connection such that the table is dropped when not needed as seen in the documentation
We'll save scheme patterns for the tables such that the for any given table the columns will be generated based on language and word type
- What I mean on this is saving the tenses that we have for the language and then creating ~six columns per tense given the forms for each (first person singular, etc)
- This is necessary as some languages have different verb forms, so we don't want to create a generic scheme for this table in particular
- See verb files for French and German (too big to open on the web, but so you can see the file paths 🙃)
One thing we're wondering is whether we'll need to load the JSON into a Swift dictionary before it goes into the SQLite DB, or if we can directly load the data into the DB from the JSON
- We just want to check whether we'll always need to load it in with SwiftyJSON, but it makes sense that we will
Another question is whether we could change how we're saving the data such that the JSON keys would just have arrays that are the columns we want, and then load the columns directly into the DB
- As of now we'd be going through and adding entries row by row, as the current JSON keys are say verb infinitives, but we could just have them in an array, the first person singular conjugations in one array, etc and then add these column by column
- It sounds way less memory intensive to add each column all at once then loop through 3000 verbs row by row
Any other suggestions from you on all of this?

andrewtavis commented 1 year ago

Changed the name of this issue so that it's more reflective of the current scope 🙃

andrewtavis commented 1 year ago

As stated in #286, a lot of what was problematic in here was coming from the way the autosuggestions lexicon was being made :) All keyboards are loading much faster at this point except Russian — the original problematic one because of its 194k noun dataset — which is still loading slowly. On an emulator it is loading though, which before #286 was not the case.

This issue and #284 are now the last issues before we can finally do the v2.2.0 release with so many new additions 😊🚀

SaurabhJamadagni commented 1 year ago

This issue and https://github.com/scribe-org/Scribe-iOS/issues/284 are now the last issues before we can finally do the v2.2.0

AWESOME! 🚀
What's the plan on this one @andrewtavis? How are we going to proceed?

andrewtavis commented 1 year ago

I'm gonna start looking at the documentation for ~~SQLite.swift~~ GRDB.swift and then try to wrap my head around it all, @SaurabhJamadagni. Really would be good to get this done as there's so much that could go out at this point 😊

andrewtavis commented 1 year ago

I got into a bit of a groove over the last day and a half 😇😄

andrewtavis commented 1 year ago

@SaurabhJamadagni, if you also want to look into ~~SQLite.swift~~ GRDB.swift that would be really great 🙏😊 Then each of us can put any ideas that we get into this issue and we can then make a plan to switch it all over. There are tons of places where data is being referenced, so I think the big thing here is to maybe make the tables and load the data into them first, and then switch over the app table by table while checking that everything works (so first nouns, then verbs, etc).

Let me know how this sounds :) :)

andrewtavis commented 1 year ago

@SaurabhJamadagni, I edited the above links and changed them to GRDB.swift as I think I'd like to go with that instead of SQLite.swift. Reasons for this are that the original maintainer is still working on GRDB.swift, these two Reddit pages (link, link), its been benchmarked against other solutions and won (maybe biased, but at least they did benchmark it), the issues are really well maintained (like I thought oh it's not used so much cuz there are no issues and then I was like 😮), and I generally really like the care that they put into all the docs that are right in the readme (I've already edited our readme a bit to borrow some of the stuff they do 😊).

Hope this sounds good to you! :)

andrewtavis commented 1 year ago

Notes on this (edited):

The GRDB readme docs and the Swift Package Index docs seem to be what we'll need
For GRDB there aren't many videos as Core Data seems to get the most love, but this one (ignore the pod installation) and the corresponding article were helpful
The big thing is that we'll need to switch over the .json file exports within Scribe-Data
- Beauty of this is that it's just us adding in an option to the data processing to export a .sqlite file that we can then reference via GRDB instead of a .json with SwiftyJSON
- There are tons of tools for this: we could use sqlite3 (Python standard library), SQLAlchemy or even pandas
- We likely still need to deal with JSON at some point as that's the Wikidata output we're getting via sparqlwrapper and we're not rewriting all of that 😅
- This article is on JSON to SQLite with sqlite3

andrewtavis commented 1 year ago

@wkyoshida, can I ask you to weigh in on this from the Scribe-Data side/your experience? Also given our discussion in https://github.com/scribe-org/Scribe-Data/issues/26, it'd be great if you'd share your opinions :)

A quick rundown for you:

We were having some bad lag on keyboard load that's mostly been fixed in #286
The Russian keyboard is still slow because of its 194k nouns, with the goal being that this issue will fix that and improve the overall data infrastructure of Scribe-iOS going forward
The suggestion was to switch over our JSON references to SQLite, and I decided on GRDB.swift via the reasons here

Questions I'm thinking about now:

Based on the future plans discussion we should be making one .sqlite file per language so that we can eventually download just the ones we need, correct?
How do we structure the process? I feel like the easiest way to get to a workable solution would be to have the same Python processes run and create the .json files that we have here in Scribe-Data (update_scribe_apps=false). After that we can write a Python file like "data_to_sqlite.py" that reads through the finalized .json files in a directory and puts their data into tables in an .sqlite file that's then moved over to Scribe-iOS as the .json files have been till now?

Your feedback and any general ideas would be very appreciated! 😊

andrewtavis commented 1 year ago

Update on this :) sqlite3 and Python's json seem to be doing the trick 😊 I'm slowly but steadily creating a file in Scribe-Data that parses through JSONs and creates a unified language .sqlite database. Note that SQLite Viewer is really useful for checking the contents of the databases from within VS Code :)

Last thing is figuring out how to auto-generate the column names for verb tables based on what's in the JSON data, which should be pretty standard. Maybe will need minor changes in iOS if the naming conventions aren't consistent 😇

Thing to do:

[x] Add verb JSON parsing to the SQLite database generation process
[x] ~~Check that verb tenses are being used consistently in iOS and won't have conflicts with the databases (maybe not a problem 🤔)~~
- Wrote the db import in a way that this won't be a problem :)
[x] Migrate copies of all formatted data from iOS back to Scribe-Data
[x] Change update_scribe_apps conditions and change process to put all data into a single formatted_data directory for each language within scribe_data/extract_transform
[x] Cleanup of update_data.py and other Scribe-Data files
[x] Test SQLite database generation process
[x] Add export to apps — another Python file that just moves the databases as the simplest form of what will eventually be happening on a network :)
[x] Commit databases for each language to iOS while maintaining the original Data directories for now
[x] Convert over the usage of JSONs in Data directories to using the new databases via GRDB.swift
- One word type after another 🐢⚠️
[x] Remove Data directories from keyboard extension directories within Scribe-iOS ~~and remove SwiftyJSON from dependencies~~

Edit: rest of issues moved to a lower comment :)

andrewtavis commented 1 year ago

077e346 adds the SQLite databases 😊😊😊😊😊 This SQLite stuff is great :) Wish I'd done this from the start! I'll work on Scribe-Data a bit more today and commit my changes later after I'm done switching over the process to not update the Data directories in Scribe-iOS, but rather local formatted_data directories in Scribe-Data.

We can now start experimenting with GRDB.swift and making references to the SQLite databases, which hopefully will go well 🤞

andrewtavis commented 1 year ago

Note that for the databases, if there is an entry that doesn't exist the NAN value is an empty string (""). This means that some Scribe-iOS features like autosuggestions and emojis will need to be changed, as I think as of now they check to see if a value exists or not rather than if it's "".

SaurabhJamadagni commented 1 year ago

@SaurabhJamadagni, if you also want to look into SQLite.swift GRDB.swift that would be really great

Hey sorry @andrewtavis. Was a bit occupied with my college semester project. I'll read up too absolutely. I am sorry for the delay. You got so much done though that's insane! Have you pushed it to a remote branch that I can take a look at too?

andrewtavis commented 1 year ago

Hey @SaurabhJamadagni! No stress or delay at all :) As I said I just got into a bit of a flow on all this and went at it 😊 The databases are already in the main branch in each of the Keyboards/LanguageKeyboards directories, and the Scribe-Data stuff will be in there later :) Feel free to read up a bit and start testing out accessing the databases 😊

SaurabhJamadagni commented 1 year ago

Taking a look at the sqllite files for the languages @andrewtavis. The data isn't referenced anywhere yet right? Looks so much more organized compared to the JSONs haha. So, you wrote the function that converts JSONs to sqlite? Is it in Scribe-Data?

So if I am understanding the transition correctly, instead of loading the arrays from JSON files we will query the database instead. But the arrays will still be formed? Or will we be performing smaller but specific queries wherever necessary and skip on creating the larger arrays completely?

For example, will data be loaded in the command variables like nouns or are we directly querying them inside the code where required?

andrewtavis commented 1 year ago

@SaurabhJamadagni, the data isn’t referenced anywhere yet :) We can each try to see how using it goes before deciding on how exactly we want to use it. And yes the functions to convert the JSONs over are in my local Scribe-Data 😊 Will push them later.

Or will we be performing smaller but specific queries wherever necessary and skip on creating the larger arrays completely?

It’s a good question! I think if we can do specific queries that would be best, but let’s decide on this later 😊

SaurabhJamadagni commented 1 year ago

Will push them later.

No rush @andrewtavis, was just curious. Sounds good :)

andrewtavis commented 1 year ago

and remove SwiftyJSON from dependencies

Noting that this should not be done as odds are we'll be using JSON still for the app texts 🤔 We'll be adopting a localization platform in #268, which likely will be translating the texts housed in JSONs :) I'm not sure if there's any easy way to do this with another file type, so for now it's fine to keep the JSONs for this purpose 🙃

andrewtavis commented 1 year ago

A reaction to this comment makes me feel even better about our choice of SQLite dependency 😊😊

https://github.com/scribe-org/Scribe-Data/commit/d830ae527640a3b23983db187c8509efc65e31a4 committed the local work I had been doing to create the SQLite databases for the language keyboards. There is a lot of code in there as there was so much refactoring... I updated the Scribe-Data changelog with the things that I did, but specifically the files of interest are:

data_to_sqlite.py: reads in the old Scribe-iOS data that's now in formatted_data directories within each language in scribe_data/extract_transform and converts them into a single SQLite database
- The results are saved to scribe_data/load/databases
send_dbs_to_scribe.py: checks the databases directory and copies any databases found there to their respective language keyboard directories here
- This is mimicking what we hope one day will be an import step once the data is downloaded in app :)

There are still a few more things to be done and I'll move some stuff around now that some files are more used for extract-transform purposes rather than for loading. The final tests/fixes of it all will be when I redo the data process for #284 🚀

andrewtavis commented 1 year ago

e1069fb updates the data based on the recently closed https://github.com/scribe-org/Scribe-Data/issues/30 :) I was realizing that when I typed a plural noun there sometimes wasn't an emoji suggestion when for the singular version there was. Ex: I type "Krokodil" and I see the crocodile emoji, but typing "Krokodile" showed nothing, which I found weird. https://github.com/scribe-org/Scribe-Data/issues/30 looked for cases where a plural noun was not a key in the emoji keyword data, and when not the emojis from its singular form were used.

andrewtavis commented 1 year ago

As the comment with the to do list shows, I'm mostly done with the Scribe-Data work except for a final test during #284 🔥

We're ready to start switching over Scribe-iOS to the new databases 😊😊 My thoughts on this would be trying to do direct queries where we can. We'd have strings like prefix for autosuggestions/completions or wordToTranslate and we can then try to do a direct query of the database for the information we want. If the data doesn't exist then we handle it by passing in case of auto actions or displaying an error in case of commands :)

I'd say it makes sense to start with prepositions as they're only for German and Russian. As soon as that's working we can start with some of the others, with translations being the next easiest as the data just has two columns. Translations from languages that aren't English are gonna be so much easier now that we can just make a database for it all 🙌

wkyoshida commented 1 year ago

Hey all! I'm catching up here on all the new work.. @andrewtavis was definitely on a roll :laughing:

Some thoughts:

On translations and potentially being able to do them from and to any language eventually, I'm wondering what's a good way to organize them as a .sqlite. They are currently stored as word-translation (i.e. english-to-target language pairs) within each language's .sqlite. Instead, could putting all the languages together make sense? i.e. A separate .sqlite with a table with rows, english, german, french, and so on? At some point perhaps, once the apps are able to selectively download specific languages, this table could dynamically get populated with only the user-selected languages.
- Thinking through this could also wait until https://github.com/scribe-org/Scribe-Data/issues/23 is done though.
On emojis - I need to browse around to remember where we landed on how to handle emoji variations (e.g. different skin tones). However, an idea could be to save only the base version of the emoji as a suggestion, so that the desired variation is then used dynamically. Now, how to determine which emojis are base emojis? One idea could be to have a separate .sqlite with a table with emoji metadata. Some rows could be id, emoji, is_base. Then, in the emoji_keywords tables of each language, instead of having the actual emojis, the tables could have the id. With the id, the app can then retrieve the corresponding emoji and whether it should treat it as a base emoji or not.
- Example:

`emoji_keywords` table in `DELanguageData.sqlite`	`word`	`emoji_1`	`emoji_2`	`emoji_3`
gesicht	34	66	71

`emojis` table in `emojis.sqlite`	`id`	`emoji`
34	😂	false
66	🤣	false
71	😭	false

On the future feature to download the data in-app, I am understanding that send_dbs_to_scribe.py is more for convenience today and that eventually it gets replaced by the future download feature. Something to consider is what file format to use for the download feature. Would the apps just be downloading the .sqlite files? Or as .jsons or something else? Based on the question below from @andrewtavis, I would guess .sqlite? Wanted to still ask though, in case there could be a reason to use something else. To answer the question as well, I think a .sqlite file per language works too (with perhaps the exception on how to handle translations as broached above).
- Based on https://github.com/scribe-org/Scribe-Data/issues/26 we should be making one .sqlite file per language so that we can eventually download just the ones we need, correct?
On Scribe-Data as a package - so data_to_sqlite.py is doing the JSON-to-SQLite conversion today. One thing that I'm wondering - do we want to make the .sqlite generation a feature/option that is built-in to Scribe-Data? i.e. In thinking of Scribe-Data as a general-use package, would it make sense to have that as an option (i.e. the user can choose to either generate as .json or .sqlite)? If maybe not, then where should the JSON-to-SQLite conversion happen? On the app-side? Or, referencing the discussions on https://github.com/scribe-org/Scribe-Data/issues/26, on the Scribe-Server side?
- FWIW, I'm thinking that perhaps we don't make it a Scribe-Data option. The data_to_sqlite.py works as an interim solution, but eventually, we can move that logic to Scribe-Server.
Final thoughts - ideally, I think a different approach on how to structure this data process would've been better, as touched on in the last two points. However, due to limitations such as not having the data download feature or Scribe-Server in place, I agree with the approach with send_dbs_to_scribe.py and data_to_sqlite.py. It perhaps makes the most sense with what we can today and with how pressing moving to SQLite is. Really awesome you got it done so quick! :raised_hands:

andrewtavis commented 1 year ago

Hey @wkyoshida 👋 Thanks for the feedback and the thoughts! 😊

They are currently stored as word-translation (i.e. english-to-target language pairs) within each language's .sqlite. Instead, could putting all the languages together make sense? i.e. A separate .sqlite with a table with rows, english, german, french, and so on?

I'd say this would be once Wikidata gets to a point where we can get translations from there, right? That's when we can actually just get a single table. As of now we'll just be downloading as many words as we can and machine translating them, so I think source language based .sqlite files make sense where all the words we can get are primary keys, and all the translations from Hugging Face are the various columns via https://github.com/scribe-org/Scribe-Data/issues/23 as you mentioned.

One idea could be to have a separate .sqlite with a table with emoji metadata. Some rows could be id, emoji, is_base. Then, in the emoji_keywords tables of each language, instead of having the actual emojis, the tables could have the id.

This makes sense and seems to be where we'd go for #271 :)

Would the apps just be downloading the .sqlite files? Or as .jsons or something else? Based on the question below from @andrewtavis, I would guess .sqlite?

I'd also say .sqlite for now, but we can discuss this a bit later as you and I had also been discussing last_updated as a part of this so that the user's data update process could be much faster. Selective updating would also likely be easiest done via .sqlite, or would you suggest something else?

i.e. In thinking of Scribe-Data as a general-use package, would it make sense to have that as an option (i.e. the user can choose to either generate as .json or .sqlite)? If maybe not, then where should the JSON-to-SQLite conversion happen? On the app-side?

I'd say definitely as we extrapolate out we can keep the option for JSONs and .sqlite file exports 😊 Why not, if it's not too hard to maintain, as by the looks of it we're doing .sqlite for the app and Wikidata exports are JSONs or other formats that we wouldn't necessarily use in app, so data prep can happen while we're still in JSONs and eventually that or other file types can be exported? Scribe-Server side for sure though :) Apps should either be just getting a whole new updated .sqlite files from this point on, or update certain rows of it based on last_updated, but let me know if this sentence makes sense 🙏

The data_to_sqlite.py works as an interim solution, but eventually, we can move that logic to Scribe-Server.

👍👍👍🙌

Really awesome you got it done so quick! 🙌

Thank you!! 😊😊🙏 Now to get this here app updated and shipped 🚀😊

wkyoshida commented 1 year ago

Hey! :wave:

I'd say this would be once Wikidata gets to a point where we can get translations from there, right?

Yeah, I'm thinking this could wait until we're able to do cross-translations - in whatever way this is accomplished in https://github.com/scribe-org/Scribe-Data/issues/23. I was more so pointing out perhaps that the current translations tables with the word-translation rows do limit us to only the English-to-target language translation, which is fine for now of course, since that is the only translation data that we have.

Selective updating would also likely be easiest done via .sqlite, or would you suggest something else?

This is what I want to make sure to think through. I think there will be three steps that are more time-heavy or resource-heavy that happen when a data download is done:

Scribe-Server generates the data pack
Scribe-Server sends the pack to the client that requested the pack
The client unpacks/processes the pack into the client

For these steps, the file format should facilitate an efficient download workflow. FWIW I am currently thinking .sqlite though, because, for the aforementioned three steps:

Getting the data needed based off of last_updated I believe will be faster if Scribe-Server is already storing the data in a DB
If using .json, this would mean having to send multiple .json per language. On the other hand, there is one .sqlite per language.
- However, if multiple languages are getting downloaded, this would still mean multiple .sqlite
If using .json, the client would have to retain the ability to read the data as .json to then write it to .sqlite

I'd say definitely as we extrapolate out we can keep the option for JSONs and .sqlite file exports blush Why not, if it's not too hard to maintain

I guess the reservation that I'm having is more so coming from putting myself in the perspective of a user of Scribe-Data (as a general-use package). Like, would I care enough about the .sqlite option? Would I use it? If I don't, would I be bothered with the extra bloat of SQLite functionality that comes packaged in?

Maybe it's fine actually. I'm not definitely opposed. I'm perhaps more trying to think from the mindset that, as a user of packages, I would like my packages to do what I need them to, be small and compact, and have a clear use/purpose - if that makes sense.

Apps should either be just getting a whole new updated .sqlite files from this point on, or update certain rows of it based on last_updated, but let me know if this sentence makes sense :pray:

I'm with you! :+1:

andrewtavis commented 1 year ago

I was more so pointing out perhaps that the current translations tables with the word-translation rows do limit us to only the English-to-target language translation, which is fine for now of course, since that is the only translation data that we have.

I'm definitely of the opinion that as soon as more options are available for translation that instead of translation as a column we'd have a language name which is what the word would be translated into. I think we're on the same page 😊 What'll need to happen is that the relationship between translations will need to be inverted based on what we have right now. Right now we have within the German files a file that links English to German, but what will be there instead will be a file where word is German words and each of the columns is what we've gotten as translations. Right now we are saving translation files with their target, but in the future it'd be those words that are typed on the keyboard that's switched to for during translation :)

Glad to hear that it sounds like .sqlite can be a solid foundation for the future! Let's keep discussing and readjusting, but for now we're sailing ahead ⛵➡️

Maybe it's fine actually. I'm not definitely opposed. I'm perhaps more trying to think from the mindset that, as a user of packages, I would like my packages to do what I need them to, be small and compact, and have a clear use/purpose - if that makes sense.

I think it also depends on who the people are that first want to use it for non-Scribe purposes, but definitely happy to remove the bloat and even have a lot of the multi-use stuff be in Server for a wider community where Scribe-Data is more just what we need :)

I'm with you! 👍

🚀🚀🚀😊

scribe-org / Scribe-iOS