lexibank / uralex

UraLex basic vocabulary dataset
Creative Commons Attribution 4.0 International
3 stars 5 forks source link

Updated UraLex (2.0) files #8

Closed MervideHeer closed 3 years ago

MervideHeer commented 3 years ago

Dear UraLex and Lexibank contributors,

**@kasyrj and I have now updated all our files involving the upcoming release of UraLex 2.0. This version expands and updates the loanword material drafted in the previous version. We would like to merge the updates from my fork to the main repository. We will contact the Lexibank hosts soon about updating the CLDF files and creating a new Zenodo release of UraLex 2.0.

Here is a summary of the changes we did in the files and folders. **

-We agreed to replace the contents of the “raw” folder with the UraLex 2.0 materials so that the parts look the same between the first and second version (no special UraLex 2.0 folder and several data files with different names)

Within “raw”: -We replaced the original Data.tsv with the updated data. There are the following changes: removed columns regarding the age of the borrowings. Borr_source, Borr_quality and Ref_borr are updated. Some inconsistent dates in Ref_borr are fixed in this pull.

-Since my previous pull, we added the BibTex codes to the Borrowing_references.tsv and double-checked some of the dates of the references. Accidental duplicate entries removed, one entry added. Tweaks in Borrowing_references.bib: spaces and unnecessary inconsistent information removed. We checked and fixed some parsing problems.

-The ODS-file is removed because it is redundant. We wish to encourage using the tsv-files instead.

-Both Python scripts bib2tsv.py & tsv2bib.py are updated to handle the borrowing references.

-No changes were required for Citations.bib, Citation_codes.tsv, Language_compilers.tsv, Languages.tsv, Meaning_list_descriptions.tsv, Meaning_lists.tsv and Meanigs.tsv since the first version.

In main: -We replaced the documentation with the updated UraLex2.0_Documentation.md and solved UTF-8 problems

xrotwang commented 3 years ago

@MervideHeer so is this the status after the review period - i.e. ready to be published?

MervideHeer commented 3 years ago

@MervideHeer so is this the status after the review period - i.e. ready to be published?

Hello! We are finished with the files. This is more of an "internal review" here. The goal is to get the data out and in conjunction, I'll submit a paper (ready to go) to a linguistic journal where we analyze and demonstrate the data. I just send an email about moving towards the final Zenodo phase. For these steps, want to keep the people who were involved in the first release informed so that I don’t miss anything. Unfortunately, they do not seem to be participating in this repository and the discussion here.

xrotwang commented 3 years ago

@lmaurits and @kasyrj are collaborators on this repos - so should/could get notifications about discussions here.

@MervideHeer anyone else we should add?

MervideHeer commented 3 years ago

@lmaurits and @kasyrj are collaborators on this repos - so should/could get notifications about discussions here.

@MervideHeer anyone else we should add?

I'd like to have Michael Dunn @evoling in the loop as we as the leader of our Bedlan group Outi Vesakoski (I suspect she might not use GitHub).

xrotwang commented 3 years ago

@MervideHeer some references seem to be missing in the borrowings bibfile:

xrotwang commented 3 years ago

@MervideHeer I had to make a couple of small edits to the borrowings bibfile already. So to fix https://github.com/lexibank/uralex/pull/8#issuecomment-842910116 it may be best, if you could just provide me with the missing references, and I can merge them into the bibfile.

xrotwang commented 3 years ago

@MervideHeer So the reference ETY should be replace with NES, correct? Alternatively, we might use ETY as bibtex key in the bibfile instead of NES.

MervideHeer commented 3 years ago

I think the documentation should be moved to uralex_documentation.md, not UraLex_2.0_documentation.md. For the rest of the files, we basically accepted that versioning is done by git, i.e. there's no need for version tags to appear in file names. For consistency - and also because the "old" docs was at uralex_documentation.md - we should change this.

@MervideHeer I had to make a couple of small edits to the borrowings bibfile already. So to fix #8 (comment) it may be best, if you could just provide me with the missing references, and I can merge them into the bibfile.

I made a .bib file of the references which I dropped in my fork and I also have put it here in a Zip (apparently I can't send bib files in a comment). My Zotero-based approach for creating the bibliography is not working the way I had hope. Sorry for the mess!

Missing_UraLex2.0_refs.zip

xrotwang commented 3 years ago

@MervideHeer should be fine. I think I can take it from here. À propos Zotero-based approach: Right at this moment I'm looking for a test case for pulling in bibliographies for CLDF datasets from Zotero :) Zotero has a web API which I'd like to use for this purpose. Is your Zotero bibliography a group library? And is it hosted at Zotero? And my I get access to play around with it?

xrotwang commented 3 years ago

Oh, and what about Aikio 2012 ?

MervideHeer commented 3 years ago

Oh, and what about Aikio 2012 ?

It's there now Missing_UraLex2.0_refs.zip ! I got somewhat confused about all the papers titled "Essay" in my Zotero but I'm back on track now.

MervideHeer commented 3 years ago

@MervideHeer should be fine. I think I can take it from here. À propos Zotero-based approach: Right at this moment I'm looking for a test case for pulling in bibliographies for CLDF datasets from Zotero :) Zotero has a web API which I'd like to use for this purpose. Is your Zotero bibliography a group library? And is it hosted at Zotero? And my I get access to play around with it?

Yes, we have a Bedlan Zotero group library and I have dropped my reference folder there. @evoling is hosting our group and he could add you. We need to ask what our library owners think. Personally, I think that the UraLex references could be an interesting test case because there are many challenges: various languages and special characters everywhere, all kinds of books and papers from over a century time span and even website references. :D

My biggest challenges with Zotero are the language and character issues, pulling wrong metadata sometimes breaking the reference somehow needing too many manual fixes. In Finno-Ugric linguistics, all this automatic and digital data handling is still very new so accessible tools would make life a lot easier!

MervideHeer commented 3 years ago

@MervideHeer So the reference ETY should be replace with NES, correct? Alternatively, we might use ETY as bibtex key in the bibfile instead of NES.

I have (hopefully) gotten rid of ETY altogether.

evoling commented 3 years ago

Send me your user name @xrotwang ,and I'll add you to our zotero group.

xrotwang commented 3 years ago

@evoling https://www.zotero.org/xrotwang

xrotwang commented 3 years ago

@evoling I already started experimenting a bit, and here's the kind of functionality I'd like to implement: https://github.com/dlce-eva/collabutils/blob/main/src/collabutils/zotero.py#L69

It's basically about