lexibank / pylexibank

The python curation library for lexibank
Apache License 2.0
17 stars 7 forks source link

Show warning when unreferenced languages or parameters are dropped #276

Closed johenglisch closed 3 months ago

johenglisch commented 3 months ago

@mbuit82 and I just had a debugging session where we were confused about why the parameter table was completely empty. Turns out there was a bug with the foreign keys in the forms table, so the entire parameter table got silently deleted.

Since we were both surprised by this behaviour I thought it might be helpful to add a warning message to tell the user that this is happening (and more importantly, why).

(Side note: Personally I think the makecldf command shouldn't remove stuff in the background at all but I'm guessing there's probably data sets out there that rely on this behaviour and I wouldn't want to break their stuff.)

While I was at it, I updated the CI to test against currently supported Python versions.

xrotwang commented 3 months ago

If makecldf wouldn't remove stuff (by starting out with an empty cldf directory), vestigial remains of older runs might accumulate. This could happen when files get renamed for example. The whole cldfbench workflow really makes most sense with version-control. My typical workflow would be running makecldf, then git status and git diff.

xrotwang commented 3 months ago

Ah, reading your changes, I realize that by "remove stuff in the background" you refer to objects that have been added explicitly by the users in the actual makecldf command. And I have to admit that I agree - having been bitten by this myself a couple of times :)

I'd be willing to sacrifice backwards compat here. The worst that could happen would be datasets which have languages or parameters without associated data.

johenglisch commented 3 months ago

I realize that by "remove stuff in the background" you refer to objects that have been added explicitly by the users

Oh, yeah, sry about the phrasing. That the cldf gets cleared out every time makes sense of course – in fact, it's so normal to me I completely forgot about it when I wrote the PR.

I'd be willing to sacrifice backwards compat here.

Sounds good, I changed the patch accordingly.

The worst that could happen would be datasets which have languages or parameters without associated data.

Yeah, true. Also, being able to look at these unassociated languages/parameters in LibreOffice is definitely a nicer way to diagnose bugs than stepping through the code in the debugger and inspecting raw python dictionaries (which is what we ended up doing).

xrotwang commented 3 months ago

@johenglisch this might be the right time to also start a CHANGELOG.md for pylexibank. Would you want to add this with this PR?

johenglisch commented 3 months ago

Done – I updated the release instructions as well (although I'm not entirely happy with the wording).