moskewcz / boda

Boda: A C++ Framework for Efficient Experiments in Computer Vision
Other
63 stars 12 forks source link

checking my understanding: from cmd-line to code-gen #14

Open forresti opened 7 years ago

forresti commented 7 years ago

I'm tracing through the current code's path from "running the Boda executable on the command line" to "generating OpenCL kernels." Am I on the right track with the following interpretation of the call stack?

#zeroth level
$ boda capture_classify --model-name=... --cnet_predict=... --capture=...

  #1st level
  #src/cap_app.cc
  struct capture_classify //NESI(..., type_id="capture_classify")
    main()

    #2nd level
    cnet_predict->setup_cnet(); #implemented in src/caffeif.cc

      #3rd level
      conv_fwd->init( conv_pipe ); #implemented in src/rtc_fwd.cc

        #4th level
        for( set_string::const_iterator i = cp->bots.begin(); i != cp->bots.end(); ++i ) { 
          gen_ops_rec( *i ); #implemented in src/rtc_fwd.cc
         }

         #5th level
         for( vect_p_conv_op_t::const_iterator j = node->in_place_ops.begin(); j != node->in_place_ops.end(); ++j ) {
           gen_op( *j ); #implemented in src/rtc_fwd.cc
         }

           #6th level
           p_rtc_call_gen_t xp_fn = codegen.gen_func( op_base_t{ "k1conv_xpose_in", 
                                    oi->dims_vals, oi->str_vals } ); #implemented in src/rtc_func_gen.cc

                 #finally the magic happens and templates get filled in:
                 rcg->init( rtc_template, cc.get(), gen_fn );
                 used_names.insert( gen_fn );

Is this the right general flow?


A few specific questions:

  1. If I use tconv instead of k1conv, are there any changes to the compilation flow except for what happens in cnn_opp.cc>add_cnn_codegen_annotations()?
  2. When does cnn_custom_codegen (cnn_codegen.cc) get used instead of rtc_func_gen (rtc_func.gen.cc)?
  3. What does rcg stand for?
moskewcz commented 7 years ago

yes, that's the right general flow.

1) no-ish, but some things certainly depend on stuff set in add_cnn_codegen_annotations(). variant selection happens in there (the function add_cnn_codegen_annotations() in ccn_op.{cc,H}), which is between steps (3) and (4) in your list above, in conv_pipe_fwd_t::init(). when the caffe net is parsed, the operation is of type 'Convolution'; the codegen annotation function fills in a bunch of stuff that selects and configures the particular convolution kernel variant to be used. gen_op() will use this info (the cts() field in particular) to cause the selected variant kernels and calls to be generated. you can think of add_cnn_codegen_annotations() as the auto-tuning hook, which currently just does manual/heuristic kernel selection.

2) both rtc_func_gen and cnn_custom_codegen are always both used. rtc_func_gen calls a hook function in cnn_custom_codegen, which is a currently a dumping ground for all the meta-code associated with all the .cucl kernels. in cnn_custom_codegen, there is one sub-function per kernel, dispatched by name, and various bits of shared meta-code. whenever a .cucl template needs meta-code, it gets put in cnn_custom_codegen. in thoery there could be multiple such custom_codegen_t dervived classes, but there aren't currently. the iface just isolates the meta-code from the rest of the function generation code.

3) maybe rtc call gen? not all the names are particularly sensible, and some are probably wrong/out-of-date.

some bonus notes:

4) as you might expect, the overall flow has changed before and will probably change again. boundaries shift, layers get added/removed, things get renamed, etc.

5) there is a similar set of flows in the "cnn_op_info" (annotates and runs operations, which generated the boda-rtc paper results) and "rtc_prof" (maybe broken, runs already-annotated operations) modes. in some ways they are simpler because they don't deal with caffe net input or with full operation graphs, only with single operations. the key function they both use is profile_rcg_call(). of particular note, these modes handle some convolution variants that are not yet handled by the 'full' flow (like that used by capture_classify), mainly due to issues with data format transforms at the inputs/outputs of the convolution(s).

mwm

forresti commented 7 years ago

Thanks!

1) Cool! I had parsed add_cnn_codegen_annotations() as the variant selector, so I guess I'm not too far off in the weeds.

2) Makes sense. Yeah, I noticed some virtual functions in that area of the code, now I see why.

3) Aha. To clarify: I see rcg_func_call_t used in a few places, and I was wondering if rcg expands to something. RTC Call Gen, it sounds like.