Closed johenglisch closed 2 years ago
If we passed args.writer
in https://github.com/glottobank/grambank-cldf/blob/17cb9cc1b4563a2c9ffd4c867788859780c6a476/cldfbench_grambank.py#L32 and appended rows to args.writer.objects
, we could delegate the Dataset.write
call to cldfbench
.
Hm, that feels odd because I just noticed today that pygrambank
doesn't actually depend on cldfbench
… It's probably not a big deal because cldf.create
is only called from a bench but my gut still doesn't like it.
Come to think of it, we could also do the opposite: Have cldf.create
return the dictionary of CLDF tables and let the cldfbench do the write
call. That would feel mildly less coupled imo.
I think I was just too lazy to move the complete cldf.create
function into the cmd_makecldf
method (probably also because testing for code in a cldfbench isn't straightforward). But ideally, that's what we should do, i.e. the cldfbench would depend on pygrambank
to programmatically access the raw data - and not the other way round.
Hm, I guess I would kind of lean to the opposite there as well:
In a perfect world, I would want all I/O (a.k.a. the stuff that's a PITA to test anyways) to happen directly in the CLDFbench and only pass generic Python data structures to the processing code inside pygrambank.cldf
.
(This would mean splitting up the create
function into smaller bits that can be called from a for sheet in original_sheets/*.csv
loop.)
I guess that's what I meant:
https://github.com/grambank/pygrambank/blob/53ab400a98c2dc2cb6242217cf9d498d85befc39/src/pygrambank/cldf.py#L102-L349
should be part of (not called from) cldfbench_grambank.py
.
Then stuff like the add_provenance
calls would become obsolete, too. So cldf.create
as part of the cldfbench would be somewhat shorter, too.
Regarding testing: The idea here is that the only (or at least main) responsibility of a cldfbench.Dataset is creating the CLDF data. So most of the testing can be delegated to running cldf validate
on the resulting dataset - or adding checks to something like https://github.com/cldf-datasets/imtvault/blob/main/test.py
So, to formulate a plan:
I think we can just go and merge these PRs to achieve the immediate goal of getting the zipped value tables working.
I should go through pygrambank.create
and put everything into its proper place, so we can avoid having to fix issues with cross-package PRs like this in the future.
Yes, sounds good.
Companion PR to https://github.com/glottobank/grambank-cldf/pull/124 in a combined effort to fix https://github.com/glottobank/grambank-cldf/issues/123