loretoparisi / wave2vec-recognize-docker

Wave2vec 2.0 Recognize pipeline
MIT License
33 stars 10 forks source link

Understand if it is possible to use own checkpoint from training as model file #10

Open davidavdav opened 3 years ago

davidavdav commented 3 years ago

Hello,

I've been busy with the default fairseq examples/speech_recognition/infer.py and also this repo's recognize.py, to see if it is possible to run inference using a model we made ourselves by finetuning a base model. We can get the script infer.py to work, but I've noticed that it needs to be able to find the original base model on disk. Moving the checkpoint model to a different machine is cumbersome, the base model has to be in the same location on the target machine.

I've tried to study how the model loading works for almost a day now, but I can't wrap my head around it. I think it only needs some args from the original base model, there is a lot of exchange going on between formats and names cfg, w2v_args, OmegaConf and Namespace.

The recognize.py and recognize.hydra.py break on loading a checkpoint file (but they work on published finetuned models). I would be helped if there is a way to produce a model file that works with recognize.py from the original base model and a checkpoint. I have not been able to find such a tool—I believe it is as simple as adding the correct .cfg.w2v_args info to the checkpoint, but I don't understand how.

I can get recognize.py to work with a checkpoint file with the patch below, but then model loading still refers to the original base model.

@@ -139,13 +162,24 @@ class Wav2VecPredictor:
         return feats

     def _load_model(self, model_path, target_dict):
-        w2v = torch.load(model_path)
-
+        #w2v = torch.load(model_path)
+        #if w2v['args'] is None:
+        #    w2v['args'] = Namespace()
         # Without create a FairseqTask
-        args = base_architecture(w2v["args"])
-        model = Wav2VecCtc(args, Wav2VecEncoder(args, target_dict))
-        model.load_state_dict(w2v["model"], strict=True)
-        return model
+        #args = base_architecture(w2v["args"])
+        #model = Wav2VecCtc(args, Wav2VecEncoder(args, target_dict))
+        #model.load_state_dict(w2v["model"], strict=True)
+
+        models, saved_cfg, task = load_model_ensemble_and_task(
+            utils.split_paths(model_path),
+            arg_overrides=None, # ast.literal_eval(args.model_overrides),
+            task=None,
+            suffix="",
+            strict=True,
+            num_shards=1,
+            state=None
+        )
+        return models[0]
davidavdav commented 3 years ago

I would be helped if there is a way to produce a model file that works with recognize.py from the original base model and a checkpoint.

OK, I think I got a little further. Here I posted a little script that seems to solve this issue for us.

I can't really say I've gained much understanding about the module loading process, but with the script it is possible to convert a file of type checkpoint_best.py obtained during fine-tuning to an independent model file that can be loaded with recognize.py