Closed stevemar closed 5 years ago
I ran the workshop three weeks ago in a hotel in Denver and we were surprised to see that cloning this repository would fetch ~500 MB. It will be good to clean the repository before merging everything, making it harder for attendees to download it.
yeah... that's an ouch.
28M ./_advanced-original
28M ./_advanced-original/assets
28M ./_lessons
28M ./_lessons/assets
31M ./_advanced-console
31M ./_advanced-console/assets
32M ./_advanced
32M ./_advanced/assets
32M ./_site/advanced
32M ./_site/advanced/assets
34M ./_simple
34M ./_simple/assets
34M ./_simple_console
34M ./_simple_console/assets
34M ./_site/simple
34M ./_site/simple/assets
44M ./_arduino
44M ./_arduino/assets
44M ./_multiple
44M ./_multiple/assets
The .git
history for this repo is also almost 200mb...
416K ./Core ML Vision Custom/Core ML Vision Custom
24K ./Core ML Vision Custom/Core ML Vision Custom.xcodeproj
448K ./Core ML Vision Custom
8.0K ./QuickstartWorkspace.xcworkspace/xcshareddata
12K ./QuickstartWorkspace.xcworkspace
193M ./.git/objects
4.0K ./.git/info
12K ./.git/logs
44K ./.git/hooks
8.0K ./.git/refs
193M ./.git
20K ./Core ML Vision Simple/Core ML Vision Simple.xcodeproj
44K ./Core ML Vision Simple/Core ML Vision Simple
31M ./Core ML Vision Simple
3.5M ./Training Images
228M .
I posted a new PR. The repo (without the .git folder) is around 55MB. I'm hoping that .git
can be reduced once this merges but I'm honestly not sure.
28K ./_includes
28K ./_layouts
40K ./_simple_console
44K ./.git/hooks
48K ./_advanced-console
56K ./_advanced
56K ./_arduino
56K ./_multiple
56K ./_simple
140K ./.sass-cache/69434bfffc3ee34ee148afc7c1da46c268665f11
152K ./.sass-cache
176K ./assets
176K ./assets/css
52M ./_images
209M ./.git
209M ./.git/objects
209M ./.git/objects/pack
261M .
As far as I could tell there are 4 main workshops: 1. simple, 2. advanced, 3. multiple, 4. arduino. With two in progress (5. advanced-console and 6. simple-console, both related to the starter kit). And two not linked from the homepage (7. lessons, and 8. advanced original).
I suspect advanced-original
was, given the name, a backup. lessons
was ... i'm not sure. I opted to keep the starter kit related folders, suffixed withe -console
because I think those have some value.
I also moved all the images into a common top level directory. Unpopular move? Probably. But each workshop had a separate assets
directory which by and large was a complete copy of other assets
directory with minor changes made. About 50 images were copied around 8 times. With each image being around 1MB there was about 400MB of needless duplication.
I could post another PR to re-size the images or make the lower quality but I'd like to do that in a separate PR.
Sigh... apparently jekyll doesn't like this and it's not rendering nicely (works well in markdown preview). Please hold off on reviewing.
OK. This is ready for review again. Apparently static files cannot be placed in directories that begins with _
. The images
directory is 52MB.
Pretty sure we can get the images down to ~30MB without any major quality loss: ...
smartinelli-mac images $ mogrify -path . -resize 1920x1920 -quality 70 *.png
smartinelli-mac images $ mogrify -path . -resize 1920x1920 -quality 70 *.jpg
smartinelli-mac images $ mogrify -path . -resize 1920x1920 -quality 70 *.jpeg
smartinelli-mac images $ du -h | sort -h
33M .
@stevemart this is a pull request into the gh-pages
branch, correct? If so, I’m not too worried about the size of it. I’m more concerned with the current size of the git history for the master
branch.
“The git filter-branch command and the BFG Repo-Cleaner rewrite your repository's history, which changes the SHAs for existing commits that you alter and any dependent commits. Changed commit SHAs may affect open pull requests in your repository. We recommend merging or closing all open pull requests before removing files from your repository.“
We should probably close or merge this, before resolving issue #37
I’m okay with merging, I don’t think it will interfere with the cleanse? But we really need to rip the bandaid off and get this thing cleaned
@bourdakos1
@stevemart this is a pull request into the gh-pages branch, correct? If so, I’m not too worried about the size of it. I’m more concerned with the current size of the git history for the master branch.
Yes. I'm still keep on keeping the added content to a minimum.
I am also a +1 on your analysis of #37. I ran a script to find the largest files in the commit history and it was the earlier zip files and current mlmodels:
smartinelli-mac viz2 $ ./finder.sh
All sizes are in kB. The pack column is the size of the object, compressed, inside the pack file.
size pack SHA location
33722 33730 431b1e6a2da6bdc322fc68e1015129809be814a7 Training Images/vga_male.zip
27383 27389 ac59e7014e3890bba89d4ba344b37d08d3ee85ac Training Images/hdmi_male.zip
25571 25576 4c9c3a4d361e70cda7f264a14a773db28a4ff7e3 Training Images/usb_male.zip
20679 20682 e525a02fae412d985b85bb4ef45c360210452fc9 Training Images/thunderbolt_male.zip
17947 16737 ec14ed4a8d46b5fcafde97168ef081a3d19425db Core ML Vision Simple/watson_plants.mlmodel
16744 15610 43f001317ad1277c495f2061eeaa124df4bd577d Core Ml Vision Simple/MobileNet.mlmodel
13990 13082 e0129fc9d2efc6b10d01c78dbbab8b551b60923e Core ML Vision Simple/watson_tools.mlmodel
11589 11591 5b6982e1a158c70a9ae88f5e8608f2e4a14ff770 Training Images/usb_male.zip
9698 9699 af0c2e1cc2036d16b6a5cb2ec15213a4340e085f Training Images/hdmi_male.zip
@bourdakos1 @stevemart - okay with merging this into gh-pages
?
separate issue tracked here for size of git history: https://github.com/watson-developer-cloud/visual-recognition-coreml/issues/37
@devinaconley i don't believe we need this PR (or the original repo (https://github.com/watson-developer-cloud/watson-vision-coreml-code-pattern) ) any longer.
@samuelcouch has published the workshop at https://developer.ibm.com/tutorials/watson-visual-recognition-with-core-ml-single-model/ which i believe suites the needs of the advocacy team. We can close this PR.
@samuelcouch @yrezgui @boxcarton please confirm before I close this PR.
@stevemart the workshop that @samuelcouch published is only one of the four that are included in this repo
@tmarkiewicz hi there, finally looping back to this.
I looked at the four tutorials in https://github.com/watson-developer-cloud/watson-vision-coreml-code-pattern and there are a handful of lines that are different between them. Training a single model vs multiple, or using your own pictures vs the ones provided.
I don't think that justifies the need for 3 new assets entirely. I'd rather expand https://developer.ibm.com/tutorials/watson-visual-recognition-with-core-ml-single-model/ with different NOTES or optional steps that a reader to perform or skip over, instead.
I'm going to close this PR for now.
This PR is a gigantic port of the wonderful work that @boxcarton @tmarkiewicz @samuelcouch and @yrezgui did in this repo: https://github.com/watson-developer-cloud/watson-vision-coreml-code-pattern
In an effort to consolidate our CoreML assets I am proposing that we merge the content from the aforementioned repo and include it as a github pages branch for this repo. This repo is generally more publicly known, has over 400 stars, is the basis of the code pattern, and will be the basis of the starter kit.
@devinaconley please review. I have built this locally by running
bundle install
andbundle exec jekyll serve
, below is a snapshot.