Open andrewtavis opened 1 week ago
CC @DeleMike, @catreedle, @KesharwaniArpita and @VNW22 for the initial discussion 😊
Sounds good to me> The emoji keyword functionality is essentially the same for all languages, so centralizing it makes sense. @andrewtavis Can I be assigned?
This is a great issue! @andrewtavis. I imagine we want to update something related to emojis and we would have to go through all the directories!
As you suggested, having a single point of call for the emojis is important. I went through the files, I believe gen_emoji_lexicon
in src/scribe_data/unicode/process_unicode.py
will be an important function.
This is just a shallow thought for now. I would love to contribute in any way to resolve this issue.
Let's leave this issue for a bit and continue the conversation on it. We all have a lot being worked on right now, so let's close some current issues and then we can plan the work from there :)
Thanks for your interest in helping!
I agree with centralizing the functionality, it’ll certainly save a lot of repetitive work! I'm curious, will the emoji generation remain uniform across languages, or is there a plan to account for different linguistic forms, such as gender variations?
Gendered emojis should be coming out in the current setup as it's Unicode's words that are associated with a given emoji ordered by their usage and then the top X - usually 3 - are selected :) We can keep this in mind for later though 😊
I see. Then I think there shouldn't be any issue with centralizing it :)
Hi @andrewtavis @catreedle @KesharwaniArpita @VNW22 @DeleMike I'm interested in joining the team
Hi @andrewtavis @catreedle @KesharwaniArpita @VNW22 @DeleMike I'm interested in joining the team
Welcome! I think we're still very much in the discussion phase. Looking forward to collaborating with you! :)
Hi @andrewtavis @catreedle @KesharwaniArpita @VNW22 @DeleMike I'm interested in joining the team
Welcome! I think we're still very much in the discussion phase. Looking forward to collaborating with you! :)
Thank you so much. Has the discussion started? Is the discussion on the element app?
I was going to suggest this to you, @Ekikereabasi-Nk :) We're discussing it in the issue right now. Can you look at the emoji keyword functionality and make a suggestion on how to centralize this functionality into the src/scribe_data/unicode
directory? :)
To achieve a centralize functionality I suggest the steps:
generate_emoji_keyword.py
to src/scribe_data/unicode
use all the code, and remove variable definition LANGUAGE = "English"
, emojis_per_keyword = 3
and DATA_TYPE = "emoji-keywords
. Instead we will have a centralized function will take these language-specific parameters ( LANGUAGE and emojis_per_keyword) as arguments.process_unicode.py
So, how do you all see this suggestion? @andrewtavis
Next, we will need to modify the emoji language file for each language to import the centralized function from step 1 and create a simplify code
Hi @andrewtavis, @DeleMike , @catreedle. @Ekikereabasi-Nk, Do you think we should modify the __init__.py
files to import and call centralized function, passing in the appropriate variables as arguments (e.g., language and emoji-specific variables)? It will be able to cater the grouped languages (SA Hindustani and Norweign etc) too and any other specific required customization.
I'm generally thinking that we follow @Ekikereabasi-Nk's suggestions here and maybe keep the empty __init__.py
files as a means of keeping the functionality from the project structure, but more the full process is done in the unicode directory. I'm actually not sure what languages Unicode has support for, so maybe that's something that we could explore a bit - i.e. what languages are included in the CLDR dataset. There's no better source of this information, and with this we'd know to just put an __init__.py
file in the directories for those languages that we find have emoji support. What's more, another check could be written to find which languages do have support and make sure that each of them and only them have an __init__.py
file :)
A basic thing is that the __init__.py
files should remain empty as this is Python packaging convention. They should make it easier to load something with a different name or do nothing, as I understand it.
Thanks for the feedback! I agree with the idea of keeping the init.py files as a Python packaging convention, especially to maintain the project's structure and potentially assist with language-specific functionality loading.
Regarding the suggestion of using the CLDR dataset to check which languages have emoji support, that sounds like a great idea. It will ensure we're only including relevant languages in the directories.
heyy, I'm kinda late but i'd like to join in the discussion :)
By all means, @VNW22! Let's try to get to this soon :) @Ekikereabasi-Nk, do you want to open a PR for this and the others can review?
I fully support the plan to centralize the emoji-keyword functionality by moving the shared logic to src/scribe_data/unicode—this will streamline the process and reduce redundancy. It seems like a solid solution has emerged from the discussion so far, but I’d be happy to assist with any part of the refactoring or the exploration of the CLDR dataset to ensure we cover all relevant languages.
Do you want to look into the script to check that we have emoji support for all languages that we can and don't for those that we shouldn't, @VNW22? You'd need to do the setup for CLDR, which is difficult to do on Windows (if that's your operating system, then you'll likely need WSL to run the emoji programs on a Linux machine).
Let us know!
By all means, @VNW22! Let's try to get to this soon :) @Ekikereabasi-Nk, do you want to open a PR for this and the others can review?
Alright @andrewtavis
@Ekikereabasi-Nk and @andrewtavis , I wanted to rewrite the code for the language emoji files. I think we can start collaborating on the code. While @Ekikereabasi-Nk is working on the centralized script, is it alright that I start working on the function call for the languages? We can make the minor changes later too?
@Ekikereabasi-Nk and @andrewtavis , I wanted to rewrite the code for the language emoji files. I think we can start collaborating on the code. While @Ekikereabasi-Nk is working on the centralized script, is it alright that I start working on the function call for the languages? We can make the minor changes later too?
Sure @KesharwaniArpita I'm also through with the centralize function
Feel free to send along PRs and we'll see on both ends :)
Do you want to look into the script to check that we have emoji support for all languages that we can and don't for those that we shouldn't, @VNW22? You'd need to do the setup for CLDR, which is difficult to do on Windows (if that's your operating system, then you'll likely need WSL to run the emoji programs on a Linux machine).
Let us know!
is it possible on mac?
Do you want to look into the script to check that we have emoji support for all languages that we can and don't for those that we shouldn't, @VNW22? You'd need to do the setup for CLDR, which is difficult to do on Windows (if that's your operating system, then you'll likely need WSL to run the emoji programs on a Linux machine).
Let us know!
okay, I'll be working on it
Sorry I was planning on sending along an explanation here, @VNW22, but got caught up with things :)
Is much easier on Mac and Linux. Specifically we to have a guide for this here. Let me know if anything is confusing and we can update the guide!
Thanks for looking into this 😊
Thanks @KesharwaniArpita for the work here https://github.com/scribe-org/Scribe-Data/pull/397
Sorry I was planning on sending along an explanation here, @VNW22, but got caught up with things :)
Is much easier on Mac and Linux. Specifically we to have a guide for this here. Let me know if anything is confusing and we can update the guide!
Thanks for looking into this 😊
no worries :) looking into it
Terms
Description
Something that should be changed about the project is the way that the emoji keyword functionality works. Basically all of the files in question are the same except for a few variables, and there are already CLI arguments being passed to these files. We do want the structure of the package to determine the functionality of the project, but then this is a case where there's really no benefit of repetition as there is for the queries where they serve has a record for how to get data from Wikidata via SPARQL.
Some ideas:
__init__.py
files in the emoji keyword directories that would serve to still include the this functionality for those languages that it works forsrc/scribe_data/unicode
directory, and this one file would be called from the CLIContribution
Thoughts on this would be very appreciated! Happy tor review and work with people on this 😊