k2-fsa / snowfall

Moved to https://github.com/k2-fsa/icefall
Apache License 2.0
143 stars 42 forks source link

Update mmi_bigram_{train,decode}.py #244

Closed luomingshuang closed 2 years ago

luomingshuang commented 2 years ago

@danpovey , I have reproduced the result based on the new mmibigram{train, decode}.py with the latest snowfall.

43f9acadafab1408d921e1c8c539699

About the following bug, with kangwei' help, my solution is to remove some lines code in k2/k2/csrc/intersect_dense.cu and get a new k2 to training the script. The following bug:

batch 2170, epoch 1/10 global average objf: 0.242149 over 28408356.0 frames (100.0% kept), current batch average objf: 0.255091 over 12872 frames (100.0% kept) avg time waiting for batch 0.005s
batch 2180, epoch 1/10 global average objf: 0.242089 over 28538822.0 frames (100.0% kept), current batch average objf: 0.192952 over 13173 frames (100.0% kept) avg time waiting for batch 0.005s
[F] /ceph-ly/open-source/k2/k2/csrc/intersect_dense.cu:851:lambda [](signed int)->void::operator()(signed int)->void block:[0,0,0], thread: [41,0,0] Check failed: tot_score_end == tot_score_start || fabs(tot_score_end - tot_score_start) < 1.0 -658531.687500 vs -658530.625000
/ceph-ly/open-source/k2/k2/csrc/intersect_dense.cu:851: lambda [](signed int)->void::operator()(signed int)->void: block: [0,0,0], thread: [41,0,0] Assertion `Some bad things happened` failed.
[F] /ceph-ly/open-source/k2/k2/csrc/array.h:329:T k2::Array1<T>::operator[](int32_t) const [with T = int; int32_t = int] Check failed: ret == cudaSuccess (710 vs. 0)  Error: device-side assert triggered. 

My solution:

          //K2_CHECK(tot_score_end == tot_score_start ||
          //        fabs(tot_score_end - tot_score_start) < 1.0)
          //   << tot_score_end << " vs "
          //   << tot_score_start;  // TODO: remove this
          score_cutoffs_data[fsa_idx0] = tot_score_min - output_beam;
        });
    return score_cutoffs;
  }

So, I also make a PR for k2 to remove the above lines.

luomingshuang commented 2 years ago

When I use clip_gradnorm to replace clip_gradvalue, the bug about the score difference doesn't happen. So maybe the PR about removing some line codes in k2/csrc/intersect_dense.cu is not necessary. And I also get a better result than clip_gradvalue. I think this PR is ready to merge. The result based on clip_gradnorm: image

luomingshuang commented 2 years ago

This PR is ready to merge.