pytorch / tvm

TVM integration into PyTorch
449 stars 64 forks source link

Added a custom dense op to pull weight packing as a separate op #110

Closed kimishpatel closed 4 years ago

kimishpatel commented 4 years ago

Added a custom dense op to pull weight packing as a separate op, in order to const prop it. Moved pack_width to attributes, in order to not topi and relay refer to the same pack_width param.

kimishpatel commented 4 years ago

@yinghai For 1) The motivation behind this is to be able to const prop and remove weight packing op which is currently inside dense. This makes the dense op significantly slower than what PT has. The example you gave for alter_op_layout, https://github.com/ajtulloch/tvm/blob/sparse-ops/topi/python/topi/x86/dense.py#L264-L295, seems to require me to specify how to transpose weights anyway. So I am not sure how it is different. However, if we can use this in conjunction with config that autotvm picks, I think that would be great. So I will look into this to see if that is possible to do as well as const prop and remove the weight transformation.

For 2) I have benchmarked this against PT linear op, for few limited cases that we are looking into. There the perf can get close to PT linear but it has performance cliffs around the shapes of interest, where in some shape it performs fine and others it does not. This is mainly due to the way dense schedule is and I am looking into to see if we can make it less brittle.

For 3) Yes, generally I agree with you, but at this point I want to be able to make forward progress and not get blocked. I am open to suggestions that keeps us unblocked.

kimishpatel commented 4 years ago

Benchmarking result: '[20, 1, 1024][1024, 1024]1024': {'JIT': 0.8843128681182861, 'TVM': 0.9260010719299316}, '[20, 1, 1024][2048, 1024]2048': {'JIT': 1.736067771911621, 'TVM': 1.7547852993011475}, '[20, 1, 1024][4096, 1024]4096': {'JIT': 3.5495526790618896, 'TVM': 3.3885812759399414}, '[21, 1, 1024][1024, 1024]1024': {'JIT': 0.9166271686553955, 'TVM': 0.9342143535614014}, '[21, 1, 1024][2048, 1024]2048': {'JIT': 1.7858920097351074, 'TVM': 1.810408353805542}, '[21, 1, 1024][4096, 1024]4096': {'JIT': 3.663295030593872, 'TVM': 3.655677318572998}, '[22, 1, 1024][1024, 1024]1024': {'JIT': 0.9797160625457764, 'TVM': 0.9690675735473633}, '[22, 1, 1024][2048, 1024]2048': {'JIT': 1.89042329788208, 'TVM': 1.8352470397949219}, '[22, 1, 1024][4096, 1024]4096': {'JIT': 3.54773211479187, 'TVM': 3.5949442386627197}, '[23, 1, 1024][1024, 1024]1024': {'JIT': 1.040201187133789, 'TVM': 1.04544997215271}, '[23, 1, 1024][2048, 1024]2048': {'JIT': 1.9630203247070312, 'TVM': 1.9881784915924072}, '[23, 1, 1024][4096, 1024]4096': {'JIT': 4.186109781265259, 'TVM': 4.173249006271362}, '[24, 1, 1024][1024, 1024]1024': {'JIT': 1.089646816253662, 'TVM': 1.0855863094329834}, '[24, 1, 1024][2048, 1024]2048': {'JIT': 2.1343588829040527, 'TVM': 2.1301257610321045}, '[24, 1, 1024][4096, 1024]4096': {'JIT': 4.257768392562866, 'TVM': 4.312472820281982}, '[25, 1, 1024][1024, 1024]1024': {'JIT': 1.224809169769287, 'TVM': 1.260439395904541}, '[25, 1, 1024][2048, 1024]2048': {'JIT': 2.3870952129364014, 'TVM': 2.3556478023529053}, '[25, 1, 1024][4096, 1024]4096': {'JIT': 5.038168907165527, 'TVM': 4.67673659324646}, '[26, 1, 1024][1024, 1024]1024': {'JIT': 1.1878862380981445, 'TVM': 1.1999726295471191}, '[26, 1, 1024][2048, 1024]2048': {'JIT': 2.308966636657715, 'TVM': 2.333556890487671}, '[26, 1, 1024][4096, 1024]4096': {'JIT': 4.8458123207092285, 'TVM': 4.922869443893433}}

kimishpatel commented 4 years ago

This has a dependency on another PR for TVM side that fixes the issue with partition_const_loop. I will post that soon on facebookexperimental/tvm.