grambank / grambank-analysed

3 stars 0 forks source link

git submodules pull right version #70

Closed HedvigS closed 1 year ago

HedvigS commented 2 years ago

I changed git ignore recently so that some contents of output gets pushed here. This was brought on my having to pull information from PDF tables from someone else work. I don't think we should push all of output, but a few select tables is a good idea.

This made us notice in a recent pr (#69) that we had diffs in a table we didn't expect. i highly suspect this has to do with checked out versions of git submodules between mine and Sam's clone.

I'm sorry @johenglisch to be having this discussion again. I just want to make sure we're doing it right so that it's all good.

We are only really concerned with pulling the right version of glottolog-cldf (v4.4) and grambank-cldf (v1). wals and autotyp are good if they are reproducible so it should be one particular version, but that could be the most recent one right now for example.

Currently in the readme for this repos we're saying to people to run this after cloning

git submodule update --init

but is that going to clone submodules of the same version as remote or the most recent version of those submodules elsewhere?

(I wanted to merge in #69 so I'm opening an issue here instead for a discussion we had there.)

johenglisch commented 2 years ago

When you run git submodule update --init it will go into the submodules and checkout the commit the main repo wants (regardless whether there are newer ones available) If you have the grambank submodule set to tag v1.0, everybody downstream will also check out tag v1.0.

If you want to know if any of the submodules in your local working copy got out of sync, you can either run git status – in which case folders with an out-of-sync submodule will be marked as modified:

$ git status
On branch main
Your branch is up-to-date with 'origin/main'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   grambank (new commits)

no changes added to commit (use "git add" and/or "git commit -a")

Or you can run git submodules status and check if there's a + or - sign in front of any sumodules:

$ git submodule status
 448115168dee078aa2b0c54cc7064f4cb8c06018 autotyp-data (v1.0.1)
 df1b79f38939f95c7288861a6743fd7a841d5779 glottolog-cldf (v4.5)
+ba44b4608a1176c0f23d3d478fdbfbc00bcb3c3f grambank (v1.0-42-gba44b46)
 bc8a5f961013162ee1fb628d37c5ba0a8decdd28 wals (v2020.1)

And if there's a mismatch, you can just run git submodule update to get everything back in order.

When it comes to the documentation, there might be two things we could add to the readme to make things clearer:

  1. Maybe remind people to occasionally re-run git submodule update after pulling (when the state of the submodule changed on the remote).

  2. Maybe tell people about git submodule --recursive in repos where this matters (I don't remember off of the top of my head if this is one of them).

HedvigS commented 1 year ago

Thanks @johenglisch I've got a follow-up question. I don't get a + which i take to mean that the submodule in my clone is up to date with its remote, that's right right?


skirgard@lingn06 grambank-analysed % git submodule status  
 448115168dee078aa2b0c54cc7064f4cb8c06018 autotyp-data (v1.0.1)
 df1b79f38939f95c7288861a6743fd7a841d5779 glottolog-cldf (v4.5)
 b9633e1e3c92ffde0e53567c5c97e82d6be969af grambank (v1.0)
 bc8a5f961013162ee1fb628d37c5ba0a8decdd28 wals (v2020.1)

Now, what I need is that the grambank-cldf submodule in this repos links to a more recent version of grambank-cldf. Currently it's linked to @ b9633e1 which is a bit too old.

johenglisch commented 1 year ago

Uuuuhm, iirc it went like this

  1. cd into the folder of the submodule
  2. git fetch the latest changes
  3. switch to the commit you want to be at
  4. cd back to project folder
  5. git add and git commit the submodule folder

If I don't remember it correctly (which is entirely possible), here's the relevant section of the git user manual:

https://git-scm.com/docs/user-manual.html#submodules

The answer is probably in there.

HedvigS commented 1 year ago

Okay! When I just got to step 2 this is what I'm seeing:


skirgard@lingn06w grambank % git fetch
remote: Enumerating objects: 26, done.
remote: Counting objects: 100% (26/26), done.
remote: Compressing objects: 100% (14/14), done.
remote: Total 26 (delta 16), reused 18 (delta 12), pack-reused 0
Unpacking objects: 100% (26/26), done.
From https://github.com/glottobank/grambank-cldf
   ba44b46..5284cd9  master     -> origin/master
 * [new tag]         v1.0-rc8   -> v1.0-rc8
 * [new tag]         v1.0.1     -> v1.0.1
Fetching submodule raw/Grambank
From https://github.com/glottobank/Grambank
 * [new branch]        Jaylatarche-patch-1    -> origin/Jaylatarche-patch-1
 * [new branch]        Jaylatarche-patch-2    -> origin/Jaylatarche-patch-2
 * [new branch]        Jaylatarche-patch-3    -> origin/Jaylatarche-patch-3
 * [new branch]        Jaylatarche-patch-4    -> origin/Jaylatarche-patch-4
 * [new branch]        Jaylatarche-patch-6    -> origin/Jaylatarche-patch-6
 * [new branch]        Jaylatarche-patch-7    -> origin/Jaylatarche-patch-7
 * [new branch]        Jaylatarche-patch-8    -> origin/Jaylatarche-patch-8
 * [new branch]        add-chan-glottcodes-from-cldf -> origin/add-chan-glottcodes-from-cldf
 * [new branch]        add-hueblerstability   -> origin/add-hueblerstability
 * [new branch]        added-r-code-for-#2145 -> origin/added-r-code-for-#2145
 * [new branch]        guaz1234               -> origin/guaz1234
 * [new branch]        hojucha-gb291          -> origin/hojucha-gb291
 * [new branch]        jillsam-patch-2        -> origin/jillsam-patch-2
 * [new branch]        johnaell-patch-1       -> origin/johnaell-patch-1
   e482b85e..8b28f1cc  master                 -> origin/master
 * [new branch]        nataliia_wish_list     -> origin/nataliia_wish_list
 * [new branch]        nuuu1241---nngg1234    -> origin/nuuu1241---nngg1234
 * [new branch]        passive-english        -> origin/passive-english
 * [new branch]        revert-2406-hojucha-patch-2 -> origin/revert-2406-hojucha-patch-2
 * [new branch]        v1.0-maintenance       -> origin/v1.0-maintenance
 * [new tag]           v1.0.1                 -> v1.0.1

Which is a bit overwhelming because it's for the submodules downstream as well. Am i right in assuming that the commit at the top are the most recent, also.. this doesn't.. seem to be the most recent commits right? the most recent for grambank-cldf origin most update should be this one: https://github.com/glottobank/grambank-cldf/commit/5284cd940faec187d8adf270d3ed80fbbd0ce0f1

Sorry, I'm a bit confused still, trying to work through it.

xrotwang commented 1 year ago

First, git fetch doesn't change your checkout at all (that's what git pull would do). So the stuff you see here is just commits, branches and tags that have happened in the remote reposes and haven't been fetched to your local clones.

The bit ba44b46..5284cd9 master says that for grambank-cldf it needed to fetch commits from ba44b46 to 5284cd9 on the master branch. I.e. 5284cd9 is the latest on master - just as you expected. So git checkout master and git pull will checkout master at 5284cd9.

HedvigS commented 1 year ago

First, git fetch doesn't change your checkout at all (that's what git pull would do). So the stuff you see here is just commits, branches and tags that have happened in the remote reposes and haven't been fetched to your local clones.

I know that. I was intentionally stopping at this state to evaluate what to do next.

The bit ba44b46..5284cd9 master says that for grambank-cldf it needed to fetch commits from ba44b46 to 5284cd9 on the master branch. I.e. 5284cd9 is the latest on master - just as you expected. So git checkout master and git pull will checkout master at 5284cd9. thanks.

HedvigS commented 1 year ago

SOrry, no I see clearer the git id's. All good, it's clearer now than it was when viewing the git commit history via the web browser.

HedvigS commented 1 year ago

Things... still aren't really as I expect it. My main specific problem is that the submodule that's in grambank-analysed at the moment refers to a state of the grambank cldf repos where there is still a dir called R_grambank, and this has caused confusion among collaborators because they confuse those scripts there for the correct ones that are in grambank-analysed.

What I see now is that the dir R_grambank is somehow still there... but empty? I've deleted it now, going to see what I can do next.

HedvigS commented 1 year ago

I deleted the empty dir and git doesn't seem to have noticed the change.

skirgard@lingn06w grambank % git status
On branch master
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
    modified:   raw/Grambank (new commits)

no changes added to commit (use "git add" and/or "git commit -a")

I'm interpreting that to mean that git doesn't care at all about empty dirs.

HedvigS commented 1 year ago

I think I did what I wanted? https://github.com/grambank/grambank-analysed/commit/b7f67f9b960920ea316827ef5ee8c53c0eff5c4d

HedvigS commented 1 year ago

Github desktop is now telling me that I've got a change I can commit re the submodule for the grambank folder which is changing it to: Subproject commit 5284cd940faec187d8adf270d3ed80fbbd0ce0f1-dirty

HedvigS commented 1 year ago

Seems I've encountered this?

xrotwang commented 1 year ago

I'm interpreting that to mean that git doesn't care at all about empty dirs.

Yes, that's the case.

xrotwang commented 1 year ago

I think I did what I wanted? b7f67f9

Yes, looks like it.

xrotwang commented 1 year ago

Github desktop is now telling me that I've got a change I can commit re the submodule for the grambank folder which is changing it to: Subproject commit 5284cd940faec187d8adf270d3ed80fbbd0ce0f1-dirty

You could just navigate into the submodule and run git diff to see what the uncommited changes are.

HedvigS commented 1 year ago

Thanks.

Yeah so the CLI version is:


skirgard@lingn06w grambank-analysed % ls
README.md       autotyp-data        grambank        wals
R_grambank      glottolog-cldf      grambank-analysed.zip
skirgard@lingn06w grambank-analysed % cd grambank
skirgard@lingn06w grambank % git diff
diff --git a/raw/Grambank b/raw/Grambank
index d2ff009..b32afb9 160000
--- a/raw/Grambank
+++ b/raw/Grambank
@@ -1 +1 @@
-Subproject commit d2ff009f761895d1304ab1b60999e12fe1e2a92c
+Subproject commit b32afb9393e1415fda564c757d29d7965b0b3f99
xrotwang commented 1 year ago

So the -dirty flag is gone?

HedvigS commented 1 year ago

In CLI there's no diryt tag but it's still there in the GUI. The GUI should just be a point and click version of CLI Git, but now there seems to be a discrepancy... I can just unstage that change in the GUI?

xrotwang commented 1 year ago

So the glottobank/Grambank submodule has changed, but you don't want these changes? If so, I'd say you can just

cd raw/Grambank
git checkout .

and be done?

More generally, though, I think that nested submodules are a bit difficult to maintain. Since analysis code is typically written for particular releases of data, it might be simpler to not use submodules here, but instead explicitly fetch released versions of the data from Zenodo or a GitHub release.

HedvigS commented 1 year ago

I wanted the grambank submodule to change, to switch to a more recent commit. That's what I was doing with the stuff earlier in this thread. I thought all was well and good with this commit b7f67f9 but now I'm a bit confused about this dirty tag that the GUI is showing me.

I thought that git submodule were a good fit for this kind of use case, especially since we don't have a zenodo release that collaborators can download from.

HedvigS commented 1 year ago

What I did now is I unstaged the dirty tag change, which is "discard" in the GUI. The CLI git then comes to this:

skirgard@lingn06w grambank % git status
HEAD detached at 5284cd9
nothing to commit, working tree clean
xrotwang commented 1 year ago

Ok, looks like the state you wanted, right?

HedvigS commented 1 year ago

Yes... it is. And.. I don't know what happened and I think I'm going to stop trying to find out :D!

HedvigS commented 1 year ago

Sometimes git is like god, works in mysterious ways ;)