Closed cpelley closed 3 weeks ago
In order to maintain a backlog of relevant PRs, we automatically label them as stale after 60 days of inactivity.
If this PR is still important to you, then please comment on this PR and the stale label will be removed.
Otherwise this PR will be automatically closed in 30 days time.
bump
Ultimately, it would make sense to move away from file checksum comparisons for IMPROVER acceptance testing (but I wont suggest radically overhauling the testing in the same PR, unless we abs. need to).
I'm not familiar with the testing framework of IMPROVER, @gavinevans do you have any ideas as the the reason behind the checksum failures?
I have rebased this dev. branch against latest IMPROVER master, and pulled in git LFS test data from https://github.com/metoppv/improver_test_data/commit/44351086112b072dde0ccd377bb780280a3abf07 (i.e. latest).
Thanks for this Carwyn. I can see the advantages of this approach, particularly with @gavinevans suggestions allowing us to run the acceptance tests as part of the actions (though we should be mindful of the number of actions minutes this might consume).
My experience of using this is not quite as described. Checking out the branch, or cloning the repository clean in single branch mode (as if this branch were master: git clone -b git_lfs_test_data --single-branch git@github.com:metoppv/improver.git
) both led to all of the acceptance test data files being retrieved immediately. The resources
directory is ~150MB and I could interrogate the data immediately, without performing the suggested git lfs
specific steps. Gavin found his resources
directory to be somewhat smaller than the current acceptance test data repository. Perhaps I'm just special, but I'd appreciate any thoughts on why this is?
My experience of using this is not quite as described.
😱@bayliffe, thanks for doing this experiment. Yeah, so nothing is required to get the data, just seamless without extra steps. I have no idea why I was under the impression an explicit git lfs fetch was needed now 🤷 I'll update the documentation to reflect this.
If desiring to not pull down data on clone:
GIT_LFS_SKIP_SMUDGE=1 git clone ...
Or globally:
git lfs install --skip-smudge
Gavin found his resources directory to be somewhat smaller than...
Thanks @bayliffe, @gavinevans, I'll investigate and check back in with details of my findings 👍
The resources directory is ~150MB... Gavin found his resources directory to be somewhat smaller than...
Looks good to me. I get 76.2MB for data under this branch and for data in https://github.com/metoppv/improver_test_data/commit/44351086112b072dde0ccd377bb780280a3abf07
@gavinevans, @bayliffe, all the acceptance tests pass for me locally. I also included an LFS caching support to mitigate against concerns with increased footprint. I see that the conda environment yaml files are pinned but don't specify exact builds. This will certainly lead to differences in environment compared to our pre-built one locally. I suspect this to be the route cause of the remaining failures.
For now, I'm going to opt for falling back to excluding the acceptance tests in actions since they weren't run before (was a bonus). I'll leave in the github action workflow refactor and LFS caching in-place for when someone comes back to enabling the acceptance tests through actions.
Through our experiments we have utilised all the bandwidth :(
I don't this would be ordinarily a problem so long as we don't clone the repository all the time. Having said that, I don't think we want to make the repository problematic to clones by 3rd parties.
Next week when our quota resets, I'll have a go at trying to change default behaviour so that it doesn't fetch the lfs data on checkout/clone.
Under a .lfsconfig
file of the root:
[lfs]
skip-smudge = true
or if skip-smudge
isn't supported via the lfsconfig, then:
fetchexclude = *
Converted this to draft until some indeterminate date where implications can be better characterised and understood.
In order to maintain a backlog of relevant PRs, we automatically label them as stale after 60 days of inactivity.
If this PR is still important to you, then please comment on this PR and the stale label will be removed.
Otherwise this PR will be automatically closed in 30 days time.
This stale PR has been automatically closed due to a lack of activity.
If you still care about this PR, then please re-open this PR.
Description
This change has been on my radar for a significant time. The test data being made public being a pre-requisite for realising this change, hence only now having this PR. If approved, we can then eventually remove https://github.com/metoppv/improver_test_data (perhaps that repo. is kept for historic purposes i.e. read-only so no new changes made).
Acceptance data comes form https://github.com/metoppv/improver_test_data/commit/44351086112b072dde0ccd377bb780280a3abf07 (to be updated to whatever is the latest commit from improver_test_data) data before merging).
Purpose
This pull request aims to enhance the repository's handling of binary data, specifically in the 'resources' directory under the 'tests' folder. Git LFS (Large File Storage) is introduced to efficiently manage large binary files, providing benefits such as faster cloning, reduced storage requirements, and improved overall repository performance.
Changes Made
Benefits of Using Git LFS
How Git LFS Works
Git LFS replaces large files with text pointers in the repository, while the actual binary content is stored externally. This allows Git to handle binary data more efficiently, without affecting the core functionality of version control. The .gitattributes file is used to specify which files should be managed by Git LFS.
End-user experience/impact
git lfs
handles the distinction for you).git lfs fetch
.git lfs pull
, which fetches both the pointers and the content in one step.Testing
Checklist
Issues