Open rcannood opened 5 months ago
If I set the threshold to 500K, I get:
$ java -jar ~/Downloads/bfg-1.14.0.jar --strip-blobs-bigger-than 200K lfs_test.git
Using repo : /home/rcannood/workspace/openpipelines-bio/lfs_test.git
This repo has been processed by The BFG before! Will prune repo before proceeding - to avoid unnecessary cleaning work on unused objects...
Completed prune of old objects - will now proceed with the main job!
Scanning packfile for large blobs: 794146
Scanning packfile for large blobs completed in 2,581 ms.
Found 2891 blob ids for large blobs - biggest=715168 smallest=216802
Total size (unpacked)=53673113
Found 443 objects to protect
Found 512 commit-pointing refs : HEAD, refs/heads/481-add-leiden-clustering-to-scvi-pipeline, refs/heads/590-clusterleiden-config-contains-incorrect-markdown-references, ...
Found 4 tag-pointing refs : refs/tags/0.3.0, refs/tags/0.3.1, refs/tags/0.4.0, refs/tags/0.4.1
Protected commits
-----------------
These are your protected commits, and so their contents will NOT be altered:
* commit 56ac0431 (protected by 'HEAD') - contains 3 dirty files :
- images/concepts/fig.svg (389.1 KB)
- src/mapping/bd_rhapsody/rhapsody_targeted_1.10.1_nodocker.cwl (211.7 KB)
- src/mapping/bd_rhapsody/rhapsody_wta_1.10.1_nodocker.cwl (212.8 KB)
WARNING: The dirty content above may be removed from other commits, but as
the *protected* commits still use it, it will STILL exist in your repository.
Details of protected dirty content have been recorded here :
/home/rcannood/workspace/openpipelines-bio/lfs_test.git.bfg-report/2023-11-24/14-49-03/protected-dirt/
If you *really* want this content gone, make a manual commit that removes it,
and then run the BFG on a fresh copy of your repo.
Cleaning
--------
Found 4459 commits
Cleaning commits: 100% (4459/4459)
Cleaning commits completed in 2,481 ms.
Updating 514 Refs
-----------------
Ref Before After
------------------------------------------------------------------------------------------------
refs/heads/481-add-leiden-clustering-to-scvi-pipeline | 6d0b9eec | eb966355
refs/heads/590-clusterleiden-config-contains-incorrect-markdown-references | 7abac021 | 56f76331
refs/heads/604-use-the-viash-dependencies-config-value-for-workflows | 8b7b78ba | 75caa7e9
refs/heads/automation | 9cd06207 | b857a87a
refs/heads/concat_dtypes | e92cbea4 | 21942a5e
refs/heads/feature/ataq-demux | 1666af0f | 3792b762
refs/heads/feature/ataq-qc | 98d64cbd | de89e1b7
refs/heads/feature/cellranger_convert | 951b5c99 | e43c8791
refs/heads/feature/count_demultiplexing | 6461edd3 | d5e1bd2f
refs/heads/feature/refactor_velocyto | 068ed30d | 76440a30
refs/heads/feature/scpoli_implementation | 3ee6bc23 | 7a4cbf9c
refs/heads/feature/ts | bfd45792 | ddb86b6d
refs/heads/fix_temp_var | b52db6ef | 9fb67514
refs/heads/increase_ci_memory | 9b6af876 | 299ad45f
refs/heads/integration_build | d1eaab7b | 0373d7e3
...
Updating references: 100% (514/514)
...Ref update completed in 117 ms.
Commit Tree-Dirt History
------------------------
Earliest Latest
| |
.DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
D = dirty commits (file tree fixed)
m = modified commits (commit message or parents changed)
. = clean commits (no changes to file tree)
Before After
-------------------------------------------
First modified commit | a2af1a87 | 89e209f5
Last dirty commit | e1ddf9cd | b942e054
Deleted files
-------------
Filename Git id
---------------------------------------------------------------------------------------------
CS0000007_subsample_LI00080.csv.gz | 449a7f3a (343.0 KB)
features.tsv.gz | 1288f445 (297.6 KB)
fig.svg | 2a72f8e7 (389.1 KB)
main.nf | cfd3ebb5 (213.1 KB), 732e783e (252.9 KB), ...
multi_star | 76d7c752 (337.8 KB), b87f789d (335.4 KB), ...
pbmc_1k_protein_v3_raw_feature_bc_matrix.h5 | 0d3a7789 (274.6 KB)
pbmc_1k_protein_v3_raw_feature_bc_matrix.h5ad | 62aa4349 (698.4 KB)
pipelines-target-p1.png | 1f658205 (292.0 KB), 5dc0174c (292.0 KB)
pipelines-target-p2.png | d9a7235a (300.7 KB), 55690133 (300.7 KB)
pipelines-target-p3.png | ec2cf53b (250.2 KB), ac65760d (245.8 KB), ...
pipelines.svg | 19ee6521 (278.9 KB), 16d12ddb (289.1 KB)
rhapsody_targeted_1.10.1_nodocker.cwl | 56a6310b (211.7 KB)
rhapsody_wta_1.10.1_nodocker.cwl | 5fa9ea85 (212.8 KB)
rhapsody_wta_1.10_nodocker.cwl | c941c763 (212.3 KB)
star_align | 0df72d36 (308.0 KB), 4a1e589e (307.9 KB), ...
star_align_v273a | e9182424 (308.3 KB), 39258580 (308.4 KB), ...
In total, 18874 object ids were changed. Full details are logged here:
/home/rcannood/workspace/openpipelines-bio/lfs_test.git.bfg-report/2023-11-24/14-49-03
Does this edit the repository retroactively? If so, we should make sure to exclude release
, main
, main_build
and all tags. Otherwise we could break older releases/runs. Is there a problem with having a large repo?
If we use BFG to remove all blobs larger than 1M, we can reduce the openpipeline repo from 200MiB to around 44MiB. We can probably reduce it even further if we set the threshold even lower. @DriesSchaumont WDYT?