Closed mcfrank closed 5 months ago
Processed files: [ArabicSaudi_WG].csv [ArabicSaudi_WS].csv ArabicSaudiWG_Alroqi_data.csv ArabicSaudiWG_Alroqi_fields.csv ArabicSaudiWG_Alroqi_values.csv ArabicSaudiWS_Alroqi_data.csv ArabicSaudiWS_Alroqi_fields.csv ArabicSaudiWS_Alroqi_values.csv ArabicSaudiWS_JISH_data.csv ArabicSaudiWS_JISH_fields.csv ArabicSaudiWS_JISH_values.csv ArabicSaudi_notes.md
@mcfrank Just checking, this language should be labelled "Arabic (Saudi)"? And also will need contributors / citations for these data (:
Thanks! This is Arabic (Saudi), and the citation for the JISH data is the manual listed on the CDI website. For the other dataset, I just forwarded all the info I have.
Mike
On Sun, Jul 30, 2023 at 10:39 AM Alvin Tan @.***> wrote:
Processed files: [ArabicSaudi_WG].csv https://github.com/langcog/wordbank/files/12209419/ArabicSaudi_WG.csv [ArabicSaudi_WS].csv https://github.com/langcog/wordbank/files/12209420/ArabicSaudi_WS.csv ArabicSaudiWG_Alroqi_data.csv https://github.com/langcog/wordbank/files/12209421/ArabicSaudiWG_Alroqi_data.csv ArabicSaudiWG_Alroqi_fields.csv https://github.com/langcog/wordbank/files/12209422/ArabicSaudiWG_Alroqi_fields.csv ArabicSaudiWG_Alroqi_values.csv https://github.com/langcog/wordbank/files/12209423/ArabicSaudiWG_Alroqi_values.csv ArabicSaudiWS_Alroqi_data.csv https://github.com/langcog/wordbank/files/12209424/ArabicSaudiWS_Alroqi_data.csv ArabicSaudiWS_Alroqi_fields.csv https://github.com/langcog/wordbank/files/12209425/ArabicSaudiWS_Alroqi_fields.csv ArabicSaudiWS_Alroqi_values.csv https://github.com/langcog/wordbank/files/12209426/ArabicSaudiWS_Alroqi_values.csv ArabicSaudiWS_JISH_data.csv https://github.com/langcog/wordbank/files/12209427/ArabicSaudiWS_JISH_data.csv ArabicSaudiWS_JISH_fields.csv https://github.com/langcog/wordbank/files/12209428/ArabicSaudiWS_JISH_fields.csv ArabicSaudiWS_JISH_values.csv https://github.com/langcog/wordbank/files/12209429/ArabicSaudiWS_JISH_values.csv ArabicSaudi_notes.md https://github.com/langcog/wordbank/files/12209430/ArabicSaudi_notes.md
@mcfrank https://github.com/mcfrank Just checking, this language should be labelled "Arabic (Saudi)"? And also will need contributors / citations for these data (:
— Reply to this email directly, view it on GitHub https://github.com/langcog/wordbank/issues/295#issuecomment-1657190314, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI25F7PJZEZWCGIOGKRJC3XSZWZBANCNFSM6AAAAAAX6L5NHY . You are receiving this because you were mentioned.Message ID: @.***>
JISH: Contributor: Jeddah Institute for Speech and Hearing Citation: Dashash, N., & Safi, S. (2014). JISH Arabic Communicative Development Inventory: Saudi population JACDI: User’s guide and technical manual. Jeddah: Jeddah Institute for Speech and Hearing
Alroqi: Contributors: Haifa Alroqi, King Abdulaziz University Alaa Almohammadi, King Abdulaziz University Khadeejah Alaslani, Purdue University Citation: TBD
@alvinwmtan I've started on Arabic (Saudi).
A couple of problems. WS is too big to create a database row. There are 1079 items. The program creates a 15 character text field for each and this is too big a database row for MySQL which is the database we're using. I'm trying to find a solution but no progress yet (and I'm not confident).
WG has a new category (negation_words). I need to add this to the categories.csv file. I need to add it with a lexical_category and a lexical_class. I have used function_words for both for the time being as this seems to be used quite a lot.
Finally, some of the cells have "Understands ONLY, Understands & Says" in them. They should be one or the other. No cells have them reversed so I think this is the actual value. I can link these so that these result in produces BUT I will need to amend the file so these use a semi-colon instead of comma because the comma specifies a different field.
@HenryMehta
@alvinwmtan
Arabic (Saudi) WG is now available to test.
I cannot load WS until we have a decision about whether we could us u instread of understands and p instead of produces. This would need to apply across all datasets and would impact the shiny app as previously mentioned
(fixing by switching to "u" and "p", as in #298)
I endorse this suggestion since it may come up again and will generally save space. But we do need to update the shiny apps as noted. @mikabr may need to update. Will we need to change all instruments or are "understands" and "u" now both options?
On Mon, Dec 4, 2023 at 1:27 PM Alvin Tan @.***> wrote:
(fixing by switching to "u" and "p", as in #298 https://github.com/langcog/wordbank/issues/298)
— Reply to this email directly, view it on GitHub https://github.com/langcog/wordbank/issues/295#issuecomment-1839505370, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI25F3R3LYKHO6536HFKRDYHY52PAVCNFSM6AAAAAAX6L5NH2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZZGUYDKMZXGA . You are receiving this because you were mentioned.Message ID: @.***>
@alvinwmtan We still have an issue here. I am now getting an error message of "Too many columns". I've done some reading about this and I cannot increased parameters to allow more fields. I therefore propose we amend the Arabic (Saudi) WS to be 2 files and hence 2 tables.
I endorse this suggestion since it may come up again and will generally save space. But we do need to update the shiny apps as noted. @mikabr may need to update. Will we need to change all instruments or are "understands" and "u" now both options? … On Mon, Dec 4, 2023 at 1:27 PM Alvin Tan @.> wrote: (fixing by switching to "u" and "p", as in #298 <#298>) — Reply to this email directly, view it on GitHub <#295 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI25F3R3LYKHO6536HFKRDYHY52PAVCNFSM6AAAAAAX6L5NH2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMZZGUYDKMZXGA . You are receiving this because you were mentioned.Message ID: @.>
@mcfrank @alvinwmtan For now I've applied it to French (French) WS plus all future instruments added
@HenryMehta Hm okay. Do you know what the column limit is?
@alvinwmtan It's not actually that simple because it also depends on the column names. I could probably work out but would take some time. I think we should aim to keepthe max to 750
@HenryMehta Given that the size of the col names also matters, do you think it might be possible to retain the full table if we converted all the colnames to just numbers? That would reduce the size. If not I'll think about how to split the dataset up.
@alvinwmtan We could try but I don't know how many columns that would give us and the names would actually need changing for every study because of the way the application works. We would need to change the code as well because column names are current called 'item_xx', where xx is the column number. We could reduce it name to 'ixx' because columns names must start with a letter
@HenryMehta Here is one attempt: I've separated the words (WS) and all other item types (WSOther); WS still has >800 items but hopefully it will be okay. The WS from Alroqi is unchanged. Let me know if this split is still too large and I will find a different solution.
[ArabicSaudi_WS].csv [ArabicSaudi_WSOther].csv ArabicSaudiWS_JISH_data.csv ArabicSaudiWS_JISH_fields.csv ArabicSaudiWS_JISH_values.csv ArabicSaudiWSOther_JISH_data.csv ArabicSaudiWSOther_JISH_fields.csv ArabicSaudiWSOther_JISH_values.csv
@alvinwmtan You've split the JISH files but not the Alroqi
@HenryMehta I believe the Alroqi files are all still within "WS" (only the JISH had items that now fall in "WSOther")
OK
@alvinwmtan Deploying to dev now - will need about 40 minutes to load
I've implemented allowing "u" and "p" values in wordbankr. but none of the Saudi Arabic tables seem to have those values, and the WSOther table seems to have zero rows (I'm connecting to wordbank2-dev-3
).
@HenryMehta WS looks good, don't seem to see any WSOther data
@alvinwmtan try now
@HenryMehta WS and WSOther look good. I realised I also failed to disambiguate some of the items in the WG; these should be de-conflicted now:
ArabicSaudiWG_Alroqi_data.csv ArabicSaudiWG_Alroqi_fields.csv
@alvinwmtan You've re-introduced the cells with "understands only, understands & says" instead of just one. I have previously changed these to "understands & says". I have reapplied this change
@HenryMehta thanks for catching that; looks good to me now!
import arabic data from https://github.com/langcog/ArabicCAT