thu-spmi / CAT

A CRF-based ASR Toolkit
Apache License 2.0
325 stars 74 forks source link

Dataloader perf & modularize `ctc_crf` #48

Closed maxwellzh closed 3 years ago

maxwellzh commented 3 years ago

Two main changes:

  1. Save the file ptr such that we can avoid repeatedly open/close files in training. This is critical for dataloader perf.
  2. Modularize the ctc_crf module. Now it works fully like a python module. Maybe we can further rewrite the crf init/release as a context manager in the future, with which the codes could be like
    ...
    with CRF('den_lm.fst', gpu):
    manager.run(...)
    ...

All changes have been tested on WSJ.