Closed LinguList closed 8 years ago
The history as-is is as it was represented on Wikipedia ages ago, complete with now deleted references... I don't have the necessary accesses to get a hold of locked-in papers anymore.
I have checked IDS/WOLD with Buck, I own a hardcopy of the book. Many eyes makes all bugs shallow though, as I see I must merge b4.48 and i5.57, "egg". For just the concepts, you could start off with downloading the full csv of "maximum buck" from CALS and check that, much less pain than the OCRed version I would think. Especially as I already have added links to the concept sets in the dev-version. D'you have a user on CALS, for the badge? =)
Would you happen to know:
1) Are the existing concept set ids stable? I can't add them if they are subject to change. 2) Why is "tree" twice in IDS? 1.42 == 8.60. It moved many other things from Buck so why not that? WOLD only has 8.60. 3) What happened to the original source of WOLD? The copy I have has lots of entries numbered (x)x.999x(x), containing among other things "capybara". They are gone from current WOLD overview, but you can still link to them directly: http://wold.clld.org/meaning/3-9991 4) Where does the 207-version Swadesh-list on wikipedia and wiktionary come from?
But is your buck version literally? I mean, you write:
b9.98 try,test
but Buck 1949 page 652 says:
try (= Make Trial of, Test)
It was for this reason that I started OCRing, and I actually never OCRed the main part, but just the index, which, as I saw now, also turns out to differ from the main part (in the index it says: able, be, but the number 9.95 refers to "can, may").
So the whole point of the Concepticon is to have the sources we link in an original form, meaning, that, if we say, we link to Buck 1949, we have a literal gloss as it appears in the opus.
I was considering working on a link to CALS (we currently try to link all lists we can get), especially because of your resource on Buck (1949), which is interesting for us, since we only have IDS and WOLD there. But when I saw that I couldn't tell which part is actually literal, I put this ad acta and followed up the work on the OCR of the register.
We discussed CALS here.
Regarding your questions:
Maybe it's the best to establish a CALS concept list independent of the predecessors. Once you link this list to the concepticon, you will have automatic access to all resources, like Swadesh 1952, Swadesh 1955, Wiktionary, IDS, WOLD, and many more interesting concept lists for specific language families. We would gladly put it into our collection and link from the Concepticon then back to CALS.
Forgot to add this:
Here's what we note in the concepticon resource regarding the wiktionary list, but it was posted online before I detected the Comrie list, and the note on the list on the right of the page may be refined in the future.
IIRC, what I did was merge WOLD and IDS first, then hand-check with Buck.
Are y'all aware of the ULD2? http://www.uld3.org/uld2/uld2.html I haven't removed duplicates/merged it in and AFAIK its purpose is to have a useful set of words for conversation in the world as it exists today. Another list from the conlanging word is dublex, https://web.archive.org/web/20051122060219/http://www.langmaker.com/db/rsc_dublexcompounds.htm, which aims for a balance of maximum compositionality and minimum length of the resulting compounds.
Thanks for those links, didn't know of them before!
What I'm interested in for CALS is more the concepts themselves, and which lists have which concepts, than the literal representation of the concepts in the lists themselves. Time to refactor, I guess.
Well, I understand your point, but this interest in the concepts themselves was the reason why we now have all the mess: people quote they USE a certain concept list, but in fact, due to misspellings, or concept labels not truthfully noted, they confuse what is actually meant. There are dozens if not hundreds of examples for this mess, including wrong translations across languages, sloppiness, misunderstandings, etc. This is the reason why we actually launched the concepticon project: to make a first attempt to clean this mess. And we came to the conclusion that reflecting the sources as accurately as possible is the only way to acquire a solid basis. You have my contacts now, so whenever you plan on mapping things or want to use our data and encounter problems in finding the right information, don't hesitate to contact us.
The current form used for the concept lists is not correctly reflecting how they were created in the history of linguistics. We're currently collecting a resource in which different concept links are linked to a meta-list of concepts, with the application at http://concepticon.clld.org and the data (exceeding the current application by large) at https://github.com/clld/concepticon-data. We still haven't managed to check the whole list of Buck 1949, but an OCRed version with all concepts has been prepared from the book and will be added once we found time to check it. It's not online yet, but I'll gladly push it already before I managed to propoerly link it, if needed.
Anyway: you may find the Concepticon-resource useful for the wordlist managment.