watson-developer-cloud / visual-recognition-coreml

Classify images offline using Watson Visual Recognition and Core ML
Apache License 2.0
490 stars 77 forks source link

add workshop to github pages #35

Closed stevemar closed 5 years ago

stevemar commented 6 years ago

This PR is a gigantic port of the wonderful work that @boxcarton @tmarkiewicz @samuelcouch and @yrezgui did in this repo: https://github.com/watson-developer-cloud/watson-vision-coreml-code-pattern

In an effort to consolidate our CoreML assets I am proposing that we merge the content from the aforementioned repo and include it as a github pages branch for this repo. This repo is generally more publicly known, has over 400 stars, is the basis of the code pattern, and will be the basis of the starter kit.

@devinaconley please review. I have built this locally by running bundle install and bundle exec jekyll serve, below is a snapshot.

screen shot 2018-09-10 at 3 25 26 pm
yrezgui commented 6 years ago

I ran the workshop three weeks ago in a hotel in Denver and we were surprised to see that cloning this repository would fetch ~500 MB. It will be good to clean the repository before merging everything, making it harder for attendees to download it.

stevemar commented 6 years ago

yeah... that's an ouch.

 28M    ./_advanced-original
 28M    ./_advanced-original/assets
 28M    ./_lessons
 28M    ./_lessons/assets
 31M    ./_advanced-console
 31M    ./_advanced-console/assets
 32M    ./_advanced
 32M    ./_advanced/assets
 32M    ./_site/advanced
 32M    ./_site/advanced/assets
 34M    ./_simple
 34M    ./_simple/assets
 34M    ./_simple_console
 34M    ./_simple_console/assets
 34M    ./_site/simple
 34M    ./_site/simple/assets
 44M    ./_arduino
 44M    ./_arduino/assets
 44M    ./_multiple
 44M    ./_multiple/assets
devinaconley commented 6 years ago

The .git history for this repo is also almost 200mb...

416K    ./Core ML Vision Custom/Core ML Vision Custom
 24K    ./Core ML Vision Custom/Core ML Vision Custom.xcodeproj
448K    ./Core ML Vision Custom
8.0K    ./QuickstartWorkspace.xcworkspace/xcshareddata
 12K    ./QuickstartWorkspace.xcworkspace
193M    ./.git/objects
4.0K    ./.git/info
 12K    ./.git/logs
 44K    ./.git/hooks
8.0K    ./.git/refs
193M    ./.git
 20K    ./Core ML Vision Simple/Core ML Vision Simple.xcodeproj
 44K    ./Core ML Vision Simple/Core ML Vision Simple
 31M    ./Core ML Vision Simple
3.5M    ./Training Images
228M    .
stevemar commented 6 years ago

I posted a new PR. The repo (without the .git folder) is around 55MB. I'm hoping that .git can be reduced once this merges but I'm honestly not sure.

 28K    ./_includes
 28K    ./_layouts
 40K    ./_simple_console
 44K    ./.git/hooks
 48K    ./_advanced-console
 56K    ./_advanced
 56K    ./_arduino
 56K    ./_multiple
 56K    ./_simple
140K    ./.sass-cache/69434bfffc3ee34ee148afc7c1da46c268665f11
152K    ./.sass-cache
176K    ./assets
176K    ./assets/css
 52M    ./_images
209M    ./.git
209M    ./.git/objects
209M    ./.git/objects/pack
261M    .

What new?

As far as I could tell there are 4 main workshops: 1. simple, 2. advanced, 3. multiple, 4. arduino. With two in progress (5. advanced-console and 6. simple-console, both related to the starter kit). And two not linked from the homepage (7. lessons, and 8. advanced original).

I suspect advanced-original was, given the name, a backup. lessons was ... i'm not sure. I opted to keep the starter kit related folders, suffixed withe -console because I think those have some value.

I also moved all the images into a common top level directory. Unpopular move? Probably. But each workshop had a separate assets directory which by and large was a complete copy of other assets directory with minor changes made. About 50 images were copied around 8 times. With each image being around 1MB there was about 400MB of needless duplication.

I could post another PR to re-size the images or make the lower quality but I'd like to do that in a separate PR.

stevemar commented 6 years ago

Sigh... apparently jekyll doesn't like this and it's not rendering nicely (works well in markdown preview). Please hold off on reviewing.

stevemar commented 6 years ago

OK. This is ready for review again. Apparently static files cannot be placed in directories that begins with _. The images directory is 52MB.

stevemar commented 6 years ago

Pretty sure we can get the images down to ~30MB without any major quality loss: ...

smartinelli-mac images $ mogrify -path . -resize 1920x1920 -quality 70 *.png
smartinelli-mac images $ mogrify -path . -resize 1920x1920 -quality 70 *.jpg
smartinelli-mac images $ mogrify -path . -resize 1920x1920 -quality 70 *.jpeg
smartinelli-mac images $ du -h | sort -h
 33M    .
bourdakos1 commented 6 years ago

@stevemart this is a pull request into the gh-pages branch, correct? If so, I’m not too worried about the size of it. I’m more concerned with the current size of the git history for the master branch.

bourdakos1 commented 6 years ago

“The git filter-branch command and the BFG Repo-Cleaner rewrite your repository's history, which changes the SHAs for existing commits that you alter and any dependent commits. Changed commit SHAs may affect open pull requests in your repository. We recommend merging or closing all open pull requests before removing files from your repository.“

We should probably close or merge this, before resolving issue #37

I’m okay with merging, I don’t think it will interfere with the cleanse? But we really need to rip the bandaid off and get this thing cleaned

stevemar commented 6 years ago

@bourdakos1

@stevemart this is a pull request into the gh-pages branch, correct? If so, I’m not too worried about the size of it. I’m more concerned with the current size of the git history for the master branch.

Yes. I'm still keep on keeping the added content to a minimum.

I am also a +1 on your analysis of #37. I ran a script to find the largest files in the commit history and it was the earlier zip files and current mlmodels:

smartinelli-mac viz2 $ ./finder.sh 
All sizes are in kB. The pack column is the size of the object, compressed, inside the pack file.
size   pack   SHA                                       location
33722  33730  431b1e6a2da6bdc322fc68e1015129809be814a7  Training  Images/vga_male.zip
27383  27389  ac59e7014e3890bba89d4ba344b37d08d3ee85ac  Training  Images/hdmi_male.zip
25571  25576  4c9c3a4d361e70cda7f264a14a773db28a4ff7e3  Training  Images/usb_male.zip
20679  20682  e525a02fae412d985b85bb4ef45c360210452fc9  Training  Images/thunderbolt_male.zip
17947  16737  ec14ed4a8d46b5fcafde97168ef081a3d19425db  Core ML Vision  Simple/watson_plants.mlmodel
16744  15610  43f001317ad1277c495f2061eeaa124df4bd577d  Core Ml Vision  Simple/MobileNet.mlmodel
13990  13082  e0129fc9d2efc6b10d01c78dbbab8b551b60923e  Core ML  Vision  Simple/watson_tools.mlmodel
11589  11591  5b6982e1a158c70a9ae88f5e8608f2e4a14ff770  Training  Images/usb_male.zip
9698   9699   af0c2e1cc2036d16b6a5cb2ec15213a4340e085f  Training  Images/hdmi_male.zip
devinaconley commented 5 years ago

@bourdakos1 @stevemart - okay with merging this into gh-pages?

separate issue tracked here for size of git history: https://github.com/watson-developer-cloud/visual-recognition-coreml/issues/37

stevemar commented 5 years ago

@devinaconley i don't believe we need this PR (or the original repo (https://github.com/watson-developer-cloud/watson-vision-coreml-code-pattern) ) any longer.

@samuelcouch has published the workshop at https://developer.ibm.com/tutorials/watson-visual-recognition-with-core-ml-single-model/ which i believe suites the needs of the advocacy team. We can close this PR.

@samuelcouch @yrezgui @boxcarton please confirm before I close this PR.

tmarkiewicz commented 5 years ago

@stevemart the workshop that @samuelcouch published is only one of the four that are included in this repo

stevemar commented 5 years ago

@tmarkiewicz hi there, finally looping back to this.

I looked at the four tutorials in https://github.com/watson-developer-cloud/watson-vision-coreml-code-pattern and there are a handful of lines that are different between them. Training a single model vs multiple, or using your own pictures vs the ones provided.

I don't think that justifies the need for 3 new assets entirely. I'd rather expand https://developer.ibm.com/tutorials/watson-visual-recognition-with-core-ml-single-model/ with different NOTES or optional steps that a reader to perform or skip over, instead.

I'm going to close this PR for now.