Closed HedvigS closed 2 years ago
AFAICT Glottolog doesn't have anything by Sohn from 2015 in hh.bib
. The problem here is that sourcelookup
is happy as long as at least one reference can be matched per datapoint. So the unmatched Sohn (2015:324) Grammaticalization
is simply read as some sort of comment.
Yes, I would want it to say that it tried matching Sohn 2015 to hh.bib or gb.bib and that it couldn't.
Right, okay. What can we do to change that? I know that sourcelookup already ignores things like "personal correspondence" etc, but in thise case we'd like it to try and resolve it (and fail).
So, something like this seems fairly easy to implement:
$ grambank sourcelookup original_sheets/MM_kore1280.tsv ~/projects/glottolog/glottolog
WARNING:pygrambank.srctok:unmatched ref: ('Robbeets', '2017', '611', None)
WARNING:pygrambank.srctok:unmatched ref: ('Sohn', '2015', '324', None)
WARNING:pygrambank.srctok:unmatched ref: ('Sohn', '2015', '324', None)
WARNING:pygrambank.srctok:unmatched ref: ('Sohn', '2015', '324', None)
WARNING:pygrambank.srctok:unmatched ref: ('Sohn', '2015', '324', None)
WARNING:pygrambank.srctok:unmatched ref: ('Sohn', '2015', '325', None)
Resolved sources:
155 g_Sohn_Korean Sohn, Ho-min. 1994. Korean. (Descriptive Grammars Series.) London: Routledge. xvii+584pp.
120 g_Sohn_Korean_1999 Sohn, Ho-Min. 1999. The Korean language. Cambrige: Cambridge University Press. 462pp.
20 g_LeeRamsey_Korean Iksop Lee and Ramsey, Robert S. 2000. The Korean Language. (SUNY Series in Korean Studies.) New York: State University of New York Press. xiii+374pp.
OK
Tha'd be great!
And maybe also some kind of warning for strings that include a space and something after the YEAR:PAGES. Like in this case the " grammaticalization".
fix pushed to master
Thank you
it used to be that the pygrambank cldf command ran sourcelookup on every sheet right, so that the output from there could be used for our "check warnings"-todo list. This is still the same with cldfbench right?
yes
Okay, thanks. I'll try and re-install and run it so that I can update the to do list for automatic warnings.
cldfbench is complaining that the proto-languages, like ocea1241, doesn't have a macroaea. Is this something that mucks something up? Would you like me to submit a PR to glottolog/glottolog adding macroareas for family-level languoids?
Just ignore this warning.
Hedvig Skirgård @.***> schrieb am Mi., 12. Jan. 2022, 17:53:
cldfbench is complaining that the proto-languages, like ocea1241, doesn't have a macroaea. Is this something that mucks something up? Would you like me to submit a PR to glottolog/glottolog adding macroareas for family-level languoids?
— Reply to this email directly, view it on GitHub https://github.com/grambank/pygrambank/issues/51#issuecomment-1011251552, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGUOKETGZOQVCI2UDFGEQTUVWWW3ANCNFSM5LTMSQRQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you modified the open/close state.Message ID: @.***>
Ok.
I just noticed a shortcoming with the new implementation of the sourcelookup. When I run clfbench I get warnings like above, but unlike the other feedback it doesn't tell me in what sheets the warning is occurring. That makes it hard to do a whole evaluation of all sheets to update the to do-lists.
WARNING:pygrambank.srctok:unmatched ref: ('Mangulu', '2002', None, None)
WARNING:pygrambank.srctok:unmatched ref: ('Mangulu', '2002', None, None)
WARNING:pygrambank.srctok:unmatched ref: ('Mangulu', '2002', None, None)
WARNING:pygrambank.srctok:unmatched ref: ('Mangulu', '2002', None, None)
WARNING:pygrambank.srctok:unmatched ref: ('Mangulu', '2002', None, None)
I can read in all the sheets line by line and then do a match to this output, but it'd be easier if it could report the sheet right away please.
For feature GB126 in sheet MM_kore1280 in grambank original sheet, there is a source listed as "Sohn (2015:324) Grammaticalization". I'm guessing this is because there is more than one work by Sohn from this year, and that it should be "Sohn_Grammaticalization (2015:324)". However, pygrambank sourcelookup doesn't seem to evaluate this source. When I run it on this sheet, I get:
Shouldn't it also check "Sohn (2015:324) Grammaticalization" and throw some kind of warning?