lexibank / lairgyalrong

Rgyalrong phylogenetic analysis
Creative Commons Attribution 4.0 International
4 stars 0 forks source link

More data to be added #27

Closed laiyunfan closed 9 months ago

laiyunfan commented 11 months ago

Hi Mattis,

I just found some new wordlists and would like to include them in the database. These include:

Jackson Sun (1996) is a source that had never been disclosed (although the document says it can be freely used since 2000). A friend of mine from Taiwan found it from the dusted documents in Academia Sinica.

I will keep you updated when these are ready. It shouldn't take long.

Best,

LinguList commented 11 months ago

Nice, so we make an update with these new files plus the one's you sent before, I think I can do this in November. I prefer to update ones. So if you tell me that you're ready, I'll then schedule when to combine those, okay?

laiyunfan commented 11 months ago

Yes, sure!

laiyunfan commented 11 months ago

Hi Mattis, I have uploaded five more wordlists (the descriptions are in the update information). I think that I am now ready for cognacy review once we have included the new wordlists. I will tell you more about my paper plan later on.

LinguList commented 10 months ago

Hi Yunfan. The way in which I would proceed now is:

  1. try to write one script to combine all data, using the data that is currently in EDICTOR
  2. I can modify parts of the data (language names) upon request for the data in EDICTOR as well
  3. I assign 0 cognates for all morphemes in the new data, assuming this is the best way to proceed?
laiyunfan commented 10 months ago

Hi Mattis. I agree with the three points you listed. I believe that it is the most efficient way. Let's do it this way. Thank you very much!

LinguList commented 10 months ago

Can you please list for me all wordlists that are new, and also the names of the languages, and glottocodes?

laiyunfan commented 10 months ago

Sure.

  1. CogtseSitu.tsv: Cogtse Situ, glottocode: maer1238
  2. NjorogsKhroskyabs.tsv, Njorogs Khroskyabs, yelo1242
  3. mBrongrdzongKhroskyabs.tsv, mBrongrdzong Khroskyabs, glottocode: muer1238
  4. BawangHorpa.tsv, Bawang Horpa, glottocode: horp1240
  5. BragstengSitu.tsv, Bragsteng Situ, glottocode: situ1238
  6. JRYYSitu.tsv, Situ in Jiarongyiyu, glottocode: situ1238
  7. ShimuliuJaphug.tsv, Shimuliu Japhug, glottocode: japh1234
  8. YaojiSitu.tsv, Yaoji Situ, glottocode: situ1238
LinguList commented 10 months ago
updates/2023-12-13/update.py:46: UserWarning: mBrongrzongKhroskyabs.tsv does not exist
LinguList commented 10 months ago

Here's a first overview:

Doculect Items Coverage
Bantawa 253 0.790625
Bawang Horpa 282 0.88125
Bragbar Situ 290 0.90625
Cogtse Situ 268 0.8375
Geshiza 267 0.834375
Guanyinqiao Khroskyabs 290 0.90625
Japhug 293 0.915625
Kangding Minyag 236 0.7375
Kyomkyo Situ 236 0.7375
Mazur Stau 297 0.928125
Mbarkhams Situ 241 0.753125
Ngyaltsu Zbu 291 0.909375
Njorogs Khroskyabs 294 0.91875
Old Burmese 228 0.7125
Pengbuxi Minyag 292 0.9125
Pubarong Queyu 276 0.8625
Shimuliu Japhug 261 0.815625
Situ in Jarongyiyu 136 0.425
Siyuewu Khroskyabs 289 0.903125
Stau 237 0.740625
Tangut 273 0.853125
Tshobdun 216 0.675
Wobzi Khroskyabs 290 0.90625
Xinlong Queyu 236 0.7375
Yaoji Situ 251 0.784375
Zhaba 239 0.746875
Zlarong 239 0.746875
laiyunfan commented 10 months ago

Thanks! Sorry, it should be mBrongrdzongKhroskyabs.tsv. (the warning lacks a d before z)

LinguList commented 10 months ago

@laiyunfan, I update this now, you find a folder called edictor/update/2023-12-13 in this folder, there is the new wordlist, new_data.tsv, please check this in edictor. If you think it is fine, I update the data base online.

LinguList commented 10 months ago
Doculect Items Coverage
Bantawa 253 0.790625
Bawang Horpa 282 0.88125
Bragbar Situ 290 0.90625
Cogtse Situ 268 0.8375
Geshiza 267 0.834375
Guanyinqiao Khroskyabs 290 0.90625
Japhug 293 0.915625
Kangding Minyag 236 0.7375
Kyomkyo Situ 236 0.7375
Mazur Stau 297 0.928125
Mbarkhams Situ 241 0.753125
mBrongrdzon Khroskyabs 285 0.890625
Ngyaltsu Zbu 291 0.909375
Njorogs Khroskyabs 294 0.91875
Old Burmese 228 0.7125
Pengbuxi Minyag 292 0.9125
Pubarong Queyu 276 0.8625
Shimuliu Japhug 261 0.815625
Situ in Jarongyiyu 136 0.425
Siyuewu Khroskyabs 289 0.903125
Stau 237 0.740625
Tangut 273 0.853125
Tshobdun 216 0.675
Wobzi Khroskyabs 290 0.90625
Xinlong Queyu 236 0.7375
Yaoji Situ 251 0.784375
Zhaba 239 0.746875
Zlarong 239 0.746875
laiyunfan commented 10 months ago

Thanks! I will check this as soon as possible.

LinguList commented 10 months ago

I make a PR, so you also see the code (which you should be able to run):

pip install lexibase
pip install pyedictor
pip install lingpy

I think more packages are not needed.

laiyunfan commented 10 months ago

sure, thank you!

LinguList commented 10 months ago

@laiyunfan, I updaed the code, can you check the link? You find it here: https://github.com/lexibank/lairgyalrong/blob/master/edictor/link.md

LinguList commented 10 months ago

Please copy-paste the link, do not click on teh field, that does not work.

laiyunfan commented 10 months ago

Hi, Mattis, the link works for me. Can I review the cognate judgements with this now? (I've been busy marking students' term papers this week, so I am a bit behind schedule, but I'll start quickly)

LinguList commented 10 months ago

Yes, you can!

laiyunfan commented 10 months ago

Thanks!