Closed xrotwang closed 3 years ago
@SimonGreenhill I'm in the process of switching the CLDF creation to a cldfbench-based process. Now, this process needs access to two other repositories: glottobank/Grambank
and grambank/grambank.wiki
. Right now, I have this working by prompting the user for the locations of clones of these repos. But it would seem like this is a perfect use case for git submodules. A .gitmodules
file like this
[submodule "Grambank"]
path = raw/Grambank
url = https://github.com/glottobank/Grambank.git
[submodule "grambank.wiki"]
path = raw/grambank.wiki
url = https://github.com/grambank/grambank.wiki
might do the trick.
Should we have a go - or could this be too intimidating?
So far, I really like the submodules setup. It also seems to make sense semantically: The CLDF repos is our tool to make release versions of the Grambank data - and do this by fetching data from HEAD
of glottobank/Grambank
and grambank/grambank.wiki
.
The only downside I could see, is that people may want to clone the CLDF repos - but might not have access to glottobank/Grambank
. But then, the CLDF repos is only there to put together releases for Zenodo - so it wouldn't have any interesting history or other items that would only be available through a clone - rather than an export or archived release on Zenodo.
@HedvigS You might want to look at this issue as well. I don't know how often you might want to / have to run the CLDF creation in the future, but if you do, you'll be impacted by this.
Reading up a bit more on git submodules, it seems as if most of the disadvantages people typically mention, don't apply to us. In our case, the repos we pull in via submodules would be read-only clones. The only potential confusion I can see emerges from having two clones of glottobank/Grambank
on disk. But it may also turn out to be beneficial to have one "working" copy of this repos - often checked out to a branch for a particular PR - and one implicit copy, as submodule of the CLDF repos.
hmm, I think this makes sense. A couple of questions:
Is it possible to work without the submodule if the user doesn't have access to one of the submods (is that a concern? anyone who is making the CLDF should have access to the wiki etc as well, right?)
what if I wanted to change something in the wiki and see it come through in the cldf? I wouldn't be able to make local changes to my wiki clone and run this, but have to get the change accepted into grambank-wiki first? (I don't know if this would ever happen but the 'read only' nature of submodules would mean it can't, right?
Ad 1.: I'm not sure what "work" would mean here. The CLDF repository has one single job AFAICT: Shoving grambank releases to Zenodo. With the cldfbench setup, it will acquire a second job: Creating the CLDF dataset. The first job is not something anyone but us is going to do. The second one requires access to the submodules, i.e. doesn't make sense without such access.
Ad 2.: Oh, submodules are not read-only by nature. But being able to treat them as such makes some typical probems go away. So you can totally make changes in the wiki locally - and then either discard (via checkout) or commit and push if you have the permissions.
ok, sounds good :)
@HedvigS are you ok with this? If so, I'd push the changes to pygrambank and to this repos, and CLDF creation will then be done with a new (but simpler) command.
@HedvigS thoughts?
I'm sorry I didn't see this last week. I trust you to make good calls here.
If I'm honest, I don't really understand what interest anyone but us would have in using most/all commands of pygrambank
. I don't fully understand what users we are expecting.
All I need is to be able to screen PRs for the behind the scenes repos (glottobank/Grambank) and make certain updates related to the ms and first release (wherever those things are kept). Right now, we also need to be able to push changes to the website (because of wiki updates) and this smaller change involved in this PR to parameters.csv
without affecting the Values themselves. Ideally in future releases, that shouldn't be de-coupled at all but all done at once and all changes wait until the next version.
I understand how the proper way is to archive things with Zenodo, but for convenience I would appreciate if we also kept the cldf GitHub repos. I don't understand really what you're talking about with two glottobank/Grambank.
Should we just schedule a meeting to talk it through? I'm getting quite confused, and I don't fully get what purpose all of this structure is meant to serve.
Only the cldf
command is affected by this change, and it is replaced by something equivalent.
Also, going back to the old behaviour should be simple, because you have installed pygrambank from a clone, so you could just check out the commit before my changes, ok?
Yes, I understood that only cldf was affected. I just didn't fully follow the basis of that, other than that sub-modules apparently are neat. I'm sure it's a great reason, I just don't follow it. I also still don't understand who the imagined users are, so that makes it tricky to follow along with changes.
I don't want to make use of the old behaviour if that's no longer the way things are. I'd rather know how to use the new behaviour, or for now kindly ask that someone else runs the necessary commands so that the changes in glottobank/Grambank #1193 and #36 are implemented in grambank/grambank-cldf parameters.csv.
For example, I don't understand " people may want to clone the CLDF repos - but might not have access to glottobank/Grambank.". Why would they need access to glottobank/Grambank?
Why would anyone besides essentially us three want to cldf-render from glottobank/Grambank? Isn't glottobank/Grambank strictly going to be non-public, a "behind the scenes" repos?
I'm not saying this isn't right, I'm just saying I don't understand it so I can't really evaluate what's going on.
I don't want to use the old way of doing things if that is now superseded. Please either help me do what I need to do now, or point me to instructions for how to go about doing it the new way. If that isn't possible, I guess I will go with plan B and check out an earlier version of the reposes and do it the old way.
The essential part here is not the use of submodules, but the use of cldfbench
to trigger the CLDF creation. This is useful, because
cldfbench makecldf
command includes metadata in a standardized way into the CLDF dataset, thus making the data more similar to other datasets.Since these advantages only apply to the "creating CLDF for a proper release" use case, it would seem totally justified for you to stick with the old behaviour.
Oh, and using submodules seems just the most "natural" way to provide the "raw" data where cldfbench
expects it - see https://github.com/cldf/cldfbench/#workflow .
Okay, right.
So, for now the best thing for the ms analysis and the wiki updates to the clld website is to use an older version of the repos?
Or you let me do it.
Btw.: To get wiki updates in the clld website can only be done by me, anyway.
Moving the CLDF to
grambank/grambank
should be done by turning it intocldfbench
enabled repository. Thus, CLDF creation should be implemented asmakecldf
method, using functionality frompygrambank
.