scolladon / sfdx-git-delta

Generate the sfdx content in source format from two git commits
Other
414 stars 112 forks source link

Too many diff in the output #892

Closed Bartheleway closed 1 month ago

Bartheleway commented 1 month ago

What is the problem?

When using the plugin to generate diff between two branches, it gives a diff bigger than expected.

What is the parameter and the value you used with it?

Given that "master" is my default branch representing the prod env and that I am currently on my feature branch. My feature branch is 1 commit behind master and have 1 commit that is not currently on master.

sf sgd:source:delta --to "origin/master" --from HEAD

What is the expected result?

Diff should only contains the commit which is not on master.

What is the actual result?

Diff contains both commit

Steps to reproduce

Execution context

Operating System: Windows

yarn version: 4.2.1

node version: 21.7.1

git version: 2.39.0

sfdx version: 2.45.6

sgd plugin version: 5.40.2

More information (optional)

Looking at the source code, I see that plugin gets the diffs from walking through commits.

I think it should simply use the git diff command instead of recreating it. Doing git diff origin/master...HEAD --name-only produce the expected list of modified files.

scolladon commented 1 month ago

Hi @Bartheleway !

Thanks for raising this issue and thanks for contributing in making this project better!

I think what you want to do is to compare based with the common ancestor (sort of) This is possible by following this instruction in the doc!

Let me know if it helps :)

Bartheleway commented 1 month ago

Hi, thanks for your answer, I will look into it and let you know. In the mean time, could you tell me what motivated you not using the actual git diff ? From my point of view, it will simplify the codebase and easily produce the accurate result (in my case). What use case did you try to solve using this technic? (maybe a use case I didn't uncounter yet)

scolladon commented 1 month ago

Sure, we had a version where we did the actual diff. But this implementation has multiple drawbacks :

Then why not use git diff <ref>...<ref> ? Because it will always compare with the common ancestor. We prefer offer flexibility, you can chose to compare with the branch state or with the common ancestor

Bartheleway commented 1 month ago

Okay, I tried your suggestion and indeed seems to produce the correct result.

Well I didn't know spawn was slow and in use when you do a git diff. I suppose it depends on the library used, "simple-git" most certainly use it as it requires git to be installed and is a wrapper around git commands but "isomorphic-git" seems to not use it. If I use $(git merge-base develop main) I also depend on git locally.

Additionally, I would like to known when do we need to make a comparison using branch state instead of common ancestor? For me using the common ancestor should be the default behavior.

Could it be that we add an option to CLI like --use-branch-state when we want to use it?

scolladon commented 1 month ago

I think it should not be the default behaviour because it would force users to have a particular process / way to use the plugin. We designed the plugin to be agnostic and to be the most flexible possible so it is possible to imagine any kind of workflow.

I would rather let the users have the control to provide what is the value of the parameters they want to set, never minding the way it is defined, without assuming the plugin knows best to do it.

Bartheleway commented 1 month ago

When I use GitHub PR, GitLab MR or git diff I just give two ref/branch and it makes the diff using the common ancestor. I don't understand why this tool would behave differently ... I think it is confusing that a "diff" tool is behaving differently than others by default. Having to use git merge-base force me to have a particular process to take your words 😉

I wonder how many user out here already used git merge-base in their life ? Personally it has been 15 years and I never needed it.

From a side note, I see using git merge-base a dependency over local git installation (same as simple-git still in use in the code base). As asked, what is the use case of using branch state? (I'm curious and don't want to miss a chance to learn something new).

scolladon commented 1 month ago

git merge-base is indeed a dependency over a local git installation. It is just an example in the case where you need to compare with the common ancestor. This is not the only way to get this information (it is done differently in CI, it could be done differently with tags, etc)

simple-git is dependent of the local git installation, and spawn (which is dependent of the terminal). This library is used in some particular cases where isomorphic-git does not provide an API yet. Those usage are at the edge of the features, do not uses git API that changes a lot (either in behaviour or in parameters) and does not impact the performance of the plugin.

When doing a pull request (in any provider), it compares the top of the current branch with the top of the target branch (even if it has changed in between). This is why some of the git provider propose to require the "branch to be up to date" or enforce having a "linear history". IMHO using the common ancestor is not the nominal use case 🙃

scolladon commented 1 month ago

Feel free to reopen this issue if you think there is more to discuss. I'll be happy to talk here.

Bartheleway commented 1 month ago

Could explain how it should be done in CI then? (I am currently working on this topic)

From what we discuss internally with my team, doing the git diff ref...ref has always been our way to go. We had discussion about "branch to be up to date" and come to the conclusion that except for "linear history" (which we don't need), it has no beneficial thus we don't require it.

scolladon commented 1 month ago

What I meant by it is done differently in CI is that in some CI there can be environment variable containing the base branch at its current state. So you may use those environment variable.

Also some CIs merge the PR branch "locally" (in the container) before applying all the checks configured. This, or just ensuring you are up to date with the base branch, is key to check the integration of the content of the PR with the rest of the repository.

I don't see how the integration with other development is done when comparing with common ancestor. Per exemple take an apex class MyService.cls with this content on the main branch at the commit A ref :

public class MyService {
    public class doAwesomeThings() {
        // magic here
    }
}

Now we create a branch feat/add-another-service and change the content of the class at the commit B :

public class MyService {
    public class doAwesomeThings() {
        // magic here
    }

    public class doOtherThings() {
        // unicorn taming here
    }
}

If in the mean time someone as merged into main a C commit with the following change into the MyService class :

public class MyService {
    public class doAwesomeThings() {
        // magic here
    }

    public class doThings() {
        // normal stuff here
    }
}

Those changes will not be included in the PR checks if the comparison is done against the common ancestor A --- C --- B C will not be integrated, even if B is meant to be merged after him

Bartheleway commented 1 month ago

I agree with you, content of files will only be correct if branch is on top of compared branch (or if diffs are located on other files). I was about to miss that point, thanks for reminding me that.

From what I saw, SGD is therefore only in charge of generating the manisfest and doesn't care about file content. My plan is to generate the manifest with SGD, then merge the two branches (usually feature with master or feature with quality) and then execute the sf deploy command. This only needs that no merge conflict exist between two branches (easily achievable without being up to date).

And yes of course I am using CI environment variable to know with ref I need to use :)

scolladon commented 1 month ago

SGD also generate partial metadata for any metadata type allowing it to be deployed. It is done using the --generate-delta parameter

It also list all the listable metadata contained inside files in the package.xml (per exemple Labels, SharingRules, Workflows, etc)