Open kachergis opened 2 years ago
"finir recevoir" is not the only combined item in FrenchFrenchWG_Bergmann_fields.csv -- see also item_747 "donner aller", item_757 "aller bien avec", and possibly more -- maybe a delimiter issue?
after emailing with Cecile Crimon from Sho's lab, Christina Bergmann, and Katie Von Holzen, I think I know all the problems--and unfortunately can't fix all of them, at least not without Sho's lab going back through Christina's original contribution. Quick summary (will put more on the GH issue):
Looks possible that the French WG has has some duplicated items.
1: aïe (sounds), aïe (body part), and aïe bobo (body part) are all on the WG (items 36, 147, and 581). (Only aïe bobo (body part) appears on the WS.) VonHolzen WG has items 36 and 147; Bermann WG has items 36 and 581, but not 147 -- so it seems 581 and 147 (aïe (body part), and aïe bobo (body part)) should be combined
2: dent (body part), brosse a dent (household), and dent (household) all appear on the WG (items 152, 229, and 234), and only brosse a dent (household) and dent (body part) appear on the WS (211 and 263). VonHolzen WG data has all three 'dent' items (152, 229, and 234), while Bergmann WG data has only 152 and 229...so either some forms truly have 3 'dent' items, or VonHolzen has a duplicate within the form -- anybody have a physical copy? I'm still suspicious of item_234 dent (household) -- and the [French_French_WG].csv file has not only itemID and item columns, but also an item_id column (unusual..)
3: item_746 "finir recevoir" is likely meant to be two items, "finir" and "recevoir", which should both be on the French WG (and in another dataset are right next to each other)