trilbymedia / grav-plugin-git-sync

Collaboratively Synchronize your Grav `user` folder hosted on GitHub, BitBucket or GitLab
Apache License 2.0
243 stars 58 forks source link

GitSync 2.0.0 Adding Untracked Files #117

Closed ScottHamper closed 3 years ago

ScottHamper commented 5 years ago

Hey All,

I upgraded to GitSync 2.0.0, and after saving a page in the admin interface, a bunch of files that shouldn't be tracked/added to the repo suddenly got tracked/added.

My config is set up only to sync the pages folder, but after saving a page, the following changes were committed to the Git repo:

I haven't had a chance to really dive in and discover the cause of these issues yet - I'll hopefully have time tomorrow - but I wanted to make you aware of the issue in case it's a larger issue that might impact other projects.

Has the config changed in 2.0.0? Right now, I have:

folders:
  - pages

for configuring which directories should be synced.

You may be able to reproduce this issue with the same example repo I made for pull request #102 , as it reflects the structure of my real project, but I haven't tried it yet.

w00fz commented 5 years ago

@ScottHamper the config shouldn't have changed but if you had custom sparse-checkout and .git/config and .gitignore then that could have been a problem.

I'm not entirely sure why plugins and config would have been pushed since you only have pages. Can you check your .gitignore ?

ScottHamper commented 5 years ago

Hey @w00fz,

I'm not using sparse-checkout (I had to Google it to even learn about what it is!) and I don't have anything custom going on with .git/config. I'm also not sure what I should be looking for in .gitignore.

However, I can say that I was able to reproduce the issue with the example repo I put together for #102.

Steps to reproduce:

  1. Fork the example repo
  2. Run sh deploy from the root directory to build the website
  3. Hook up a web server to serve the build directory
  4. Log into Grav as webmaster (password: Password1)
  5. Set up GitSync. Note that you may get an error when saving your config through the web interface that it couldn't fetch properly. Ignore this for now - we're just interested in generating the git-sync.yaml config file. We're going to wipe out this build in a minute and start fresh, but with the addition of the git-sync config.
  6. Move /build/user/config/plugin/git-sync.yaml to /config/plugin/git-sync.yaml
  7. Re-run sh deploy to build a fresh copy of the site with a clean repo state
  8. Edit a page in Grav and click "Save"

BE WARNED!!! Despite the fact that config/plugin/git-sync.yaml is included in .gitignore, once you save a page in Grav and GitSync does its thing, it will nuke the .gitignore file and add/commit git-sync.yaml (along with thousands of other files) to the repo. This means that the password/access token you set for GitSync to use will be committed. Either use a temporary token that you can delete right afterward, or use a private repo (or both).

Note that this issue is new to 2.0.0 for me - I reverted to 1.0.4 and cannot reproduce the problem. I still plan on diving into the code when I get more time, but hopefully this update helps in some way.

paulmassen commented 5 years ago

Same problem for me, it looks like the .gitignore files is ignored

ScottHamper commented 5 years ago

I think I have found the problematic commit:

// With the introduction of customizable paths,
// it appears that the add command should always
// add everything that is not committed to ensure
// there are no orphan changes left behind

This commit commented out the code that only added/tracked the folders set in the git-sync.yaml config. Is there some other way we should be working around this?

EDIT: It looks like perhaps GitSync is now relying on sparse checkout to limit what files are commited. The initializeRepository method was updated to always enableSparseCheckout, which then creates /.git/info/sparse-checkout with only the paths set in the git-sync.yaml config file.

The issue is that initializeRepository is only ever called onAdminAfterSave. This means it's possible to have a repo that never had sparse checkout enabled/configured the way the plugin expects. For example, when the site is built via a script that copies over a complete git-sync.yaml config file instead of a human manually configuring the plugin through the admin interface (and this is exactly what's happening in my use case).

w00fz commented 5 years ago

What you are describing has been a long lasting hidden issue that like you noticed is now forced by always initializing the repository. Especially now where other users folders can be synchronized and not just pages

The new version 2.0.1 (just released) introduces a bin/plugin git-sync init command. I think this will fix those cases where sparse-checkout wasn't implemented as expected or in the case were you do a manual setup like you described. If you could test this out it would be very helpful.

I think it would be helpful to add in the documentation your case where a Grav instance gets initialized / restored through an automated or manual script. bin/plugin git-sync init could become the final step to properly set things up.

ciddennis commented 5 years ago

I have a question about the new git-sync init. Currently we deploy to AWS and have our Grav install in one repo that has no 'pages' or 'themes' directories under the users directory. However we have all our plugins and config there.

When we deploy to an environment we have a build script that takes the Grav install and updates it based on environment. (Example script below) . As you can see we do a lot of work to bring in the other content (Pages && Themes) from the repo we have set up for the GitSync plugin. I have even added some extra .gitignore options to try and stop Grav from syncing the other directories back to our content repo. The reason for the overly complex script was to try and stop the extra syncing and an issue where git-sync sync expected the user/pages and user/themes directory there before it would sync.

So Given all this should I only need to do this now with the git-sync init comman:

sed -i -e "s/master/$GIT_BRANCH/g" user/config/plugins/git-sync.yaml
chmod 777 bin/*
bin/plugin git-sync init

Original Script

sed -i -e "s/master/$GIT_BRANCH/g" user/config/plugins/git-sync.yaml
cd user;git init;git config user.email "aUser@gmail.com";git config user.name "auser";cd ..
cd user;git config core.sparsecheckout true;cd ..
cd user;echo pages/ >> .git/info/sparse-checkout;cd ..
cd user;echo themes/ >> .git/info/sparse-checkout;cd ..
cd user;git remote add -f origin https://aUser:$GIT_TOKEN@github.com/OurOrg/our-content.git;cd ..
cd user;git pull origin $GIT_BRANCH;cd ..
cd user;git checkout $GIT_BRANCH;cd ..
chmod 777 bin/*
echo "user/accounts" > .gitignore
echo "user/config" > .gitignore
echo "user/data" > .gitignore
echo "user/plugins" > .gitignore
bin/plugin git-sync sync
w00fz commented 5 years ago

Hey @ciddennis,

That is exactly the idea, with the init command you should be able to remove all of those extra steps you have. The only thing you are missing in your now shortened version is the final bin/plugin git-sync sync, you will still need to manually run it, I do not believe init will run it for you (nor I think it should).

What you have there is a great example for testing that init is actually behaving as expected. Would you mind giving it a try and let us know ?

Pretty amazing how little you script became compared to before :)

Cheers!

ScottHamper commented 5 years ago

Alright,

So I've had some time today to try out the new init command, but it doesn't seem to work well for my use case. I started off by trying the following:

composer build
(cd build && bin/plugin git-sync init)
(cd build && bin/plugin git-sync sync)

The build composer script creates a build folder where it downloads Grav/plugins and then copies relevant folders from the repo into build/user. You can see a simplified example in the repo I linked to previously.

The issue with the above script is that the pages folder already exists under user before the sync CLI command is run. As a result, Git thinks that every single page in the site is an untracked change, and explodes when trying to sync with the remote.

Before 2.0.0, my script looked like:

composer build
cp -r .git build/user
git -C build/user checkout master

So I tried going back to that, but tacking on a call to the new init CLI command at the end:

composer build
cp -r .git build/user
git -C build/user checkout master
(cd build && bin/plugin git-sync init)

However, despite init configuring sparse checkout and creating a .gitignore file that attempts to ignore everything except the files in pages, this results in Git wanting to commit deletions for files that weren't copied by the composer build script (including the composer.json file itself). As soon as a sync is run, a bunch of files in the root of my repo get deleted erroneously.

I'm out of ideas of what else I can try.

Is it possible to simply revert the change in 1ace4ae7648 that I linked to earlier and only git add relevant folders again? I don't understand the comment, or why the change was necessary.

ciddennis commented 5 years ago

So finally getting back (Forgot). so from the script I had above I change to have:

buildspec.yml for AWS has this in the build phase.

bin/plugin git-sync init
bin/plugin git-sync sync || true
cd user;git checkout $GIT_BRANCH;cd ..

In my set up I only sync themes and pages directories. I found I still have to do the last line as it would sync the master branch and leave it checked out even if in my git-sync.yaml I had told it to sync "stage". Minor issue.

Thanks

w00fz commented 3 years ago

@ScottHamper I found an issue with Flex Objects about the gitignore not being properly updated, I am not sure if this will actually solve your original issue. However, I have now added a customizable Git Ignore field so you can properly add entries to ignore your composer items you dont want to be deleted.

See #197