tensorflow / lingvo

Lingvo
Apache License 2.0
2.82k stars 446 forks source link

Failed to utilize SpecAugment in Librispeech task #107

Open ColainCYY opened 5 years ago

ColainCYY commented 5 years ago

Hi~ I haved encountered an error when trying to enable spec-augment. The code in librispeech.py is: ep.use_specaugment = True

The error is as below:

image

It seems that the inputs is in an unexpected shape. I wonder whether I have missed some important configurations.

Thanks very much!

drpngx commented 5 years ago

Can you print out the output of py_utils.GetShape(inputs) and inputs?

On Sun, Jun 23, 2019 at 7:47 PM ColainCYY notifications@github.com wrote:

Hi~ I haved encountered an error when trying to enable spec-augment. The code in librispeech.py is: ep.use_specaugment = True

The error is as below:

[image: image] https://user-images.githubusercontent.com/30071492/59987397-4a14f580-966d-11e9-9af1-50ed570c1b00.png

It seems that the inputs is in an unexpected shape. I wonder whether I have missed some important configurations.

Thanks very much!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/lingvo/issues/107?email_source=notifications&email_token=AE75E3JIZ7KJ27LDNTAHO63P4AYUXA5CNFSM4H22QO72YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G3F4WDQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AE75E3IQQXM3IGWGCG26T73P4AYUXANCNFSM4H22QO7Q .

ColainCYY commented 5 years ago

Can you print out the output of py_utils.GetShape(inputs) and inputs? …

while initiating the graph,py_utils.GetShape(inputs) get a result as below:

Tensor("fprop/librispeech/tower_0_0/enc/Shape_6:0", shape=(?,), dtype=int32)

It is actually “tf.shape(inputs)”,the dimention remains "unknown", which leads to the breakdown.

drpngx commented 5 years ago

Right, GetShape only returns tf.shape if inputs is of unknown rank. For now, you can use series_length = inputs[1]; num_freq = inputs[2]. Usually we know at least the rank of the inputs. Might be useful to trace down where the shape is lost.

On Tue, Jun 25, 2019 at 12:20 AM ColainCYY notifications@github.com wrote:

Can you print out the output of py_utils.GetShape(inputs) and inputs? …

while initiating the graph,py_utils.GetShape(inputs) get a result as below:

Tensor("fprop/librispeech/tower_0_0/enc/Shape_6:0", shape=(?,), dtype=int32)

It is actually “tf.shape(inputs)”,the dimention remains "unknown", which leads to the breakdown.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tensorflow/lingvo/issues/107?email_source=notifications&email_token=AE75E3ITGVARDB5TNGCKS3DP4HBLDA5CNFSM4H22QO72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYPJF5I#issuecomment-505320181, or mute the thread https://github.com/notifications/unsubscribe-auth/AE75E3P6Z7L63G3W24GLQUDP4HBLDANCNFSM4H22QO7Q .

iamxiaoyubei commented 5 years ago

I meet the same problem, and this is my print of inputs and py_utils.GetShape(inputs):

inputs: Tensor("ExpandDims_1:0", dtype=float32, device=/job:local/replica:0/task:0/device:CPU:0)
py_utils.GetShape: Tensor("fprop/librispeech/tower_0_0/enc/Shape:0", shape=(?,), dtype=int32, device=/job:local/replica:0/task:0/device:CPU:0)

And my errors:

Traceback (most recent call last):
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 1557, in <module>
    tf.app.run(main)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 1553, in main
    RunnerManager(FLAGS.model).Start()
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 1546, in Start
    self.StartRunners(self.CreateRunners(FLAGS.job.split(','), FLAGS.logdir))
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 1314, in CreateRunners
    trial)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 1268, in _CreateRunner
    return self.Controller(cfg, *common_args)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 196, in __init__
    self._model.ConstructFPropBPropGraph()
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_model.py", line 1235, in ConstructFPropBPropGraph
    self._task.FPropDefaultTheta()
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_model.py", line 477, in FPropDefaultTheta
    return self.FProp(self.theta, input_batch)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_model.py", line 394, in FProp
    metrics, per_example = self._FPropSplitInputBatch(theta, input_batch)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_model.py", line 440, in _FPropSplitInputBatch
    metrics, per_example = self.FPropTower(theta_local, batch)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_model.py", line 363, in FPropTower
    predicted = self.ComputePredictions(theta, input_batch)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/tasks/asr/model.py", line 124, in ComputePredictions
    encoder_outputs = self._FrontendAndEncoderFProp(theta, input_batch_src)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/tasks/asr/model.py", line 156, in _FrontendAndEncoderFProp
    return self.encoder.FProp(theta.encoder, input_batch_src)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/tasks/asr/encoder.py", line 312, in FProp    paddings)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/spectrum_augmenter.py", line 312, in FProp
    _, series_length, num_freq, _ = py_utils.GetShape(inputs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 477, in __iter__
    "Tensor objects are only iterable when eager execution is "
TypeError: Tensor objects are only iterable when eager execution is enabled. To iterate over this tensor use tf.map_fn.
iamxiaoyubei commented 5 years ago

I trace back to the place where inputs pass in and batch.src_inputs.

However, it also occurs:

batch.src_inputs: Tensor("ExpandDims_1:0", dtype=float32, device=/job:local/replica:0/task:0/device:CPU:0)
where inputs pass into specaugment: Tensor("ExpandDims_1:0", dtype=float32, device=/job:local/replica:0/task:0/device:CPU:0)
drpngx commented 5 years ago

use x[4, ...]?

On Wed, Jul 3, 2019 at 4:43 PM Yubei notifications@github.com wrote:

I trace back to the place where inputs pass in https://github.com/tensorflow/lingvo/blob/eb50d8dca0c35007df1d57b1a2151a134a660d7a/lingvo/tasks/asr/encoder.py#L324 and batch.src_inputs https://github.com/tensorflow/lingvo/blob/eb50d8dca0c35007df1d57b1a2151a134a660d7a/lingvo/tasks/asr/encoder.py#L319 .

However, it also occurs strange shape:

batch.src_inputs: Tensor("ExpandDims_1:0", dtype=float32, device=/job:local/replica:0/task:0/device:CPU:0) where inputs pass into specaugment: Tensor("ExpandDims_1:0", dtype=float32, device=/job:local/replica:0/task:0/device:CPU:0)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tensorflow/lingvo/issues/107?email_source=notifications&email_token=AE75E3MGVB2ST4OGGT47PWDP5RRDZA5CNFSM4H22QO72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZDXPYQ#issuecomment-508000226, or mute the thread https://github.com/notifications/unsubscribe-auth/AE75E3I43CKTS5GP6J7UTTTP5RRDZANCNFSM4H22QO7Q .

iamxiaoyubei commented 5 years ago

Sorry, I don't quite understand what you said. Could you elaborate on this? Thank you~

drpngx commented 5 years ago

Sorry, try to use series_length = inputs[1,...]?

On Thu, Jul 4, 2019 at 9:53 AM Yubei notifications@github.com wrote:

Sorry, I don't quite understand what you said. Could you elaborate on this? Thank you~

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tensorflow/lingvo/issues/107?email_source=notifications&email_token=AE75E3JD4TIAXQH7YNDHCBLP5VJ3HA5CNFSM4H22QO72YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZGC5PA#issuecomment-508309180, or mute the thread https://github.com/notifications/unsubscribe-auth/AE75E3K62S6BSBIASWWYGVDP5VJ3HANCNFSM4H22QO7Q .

iamxiaoyubei commented 5 years ago

I use this:

line312    # _, series_length, num_freq, _ = py_utils.GetShape(inputs)
line313    series_length = inputs[1]
line314    num_freq = inputs[2]
line315    augmented_inputs = self._AugmentationNetwork(series_length, num_freq,
line316                                                 inputs, paddings)

However, it occurs:

Traceback (most recent call last):
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 1557, in <module>
    tf.app.run(main)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 1553, in main
    RunnerManager(FLAGS.model).Start()
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 1546, in Start
    self.StartRunners(self.CreateRunners(FLAGS.job.split(','), FLAGS.logdir))
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 1314, in CreateRunners
    trial)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 1268, in _CreateRunner
    return self.Controller(cfg, *common_args)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 196, in __init__
    self._model.ConstructFPropBPropGraph()
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_model.py", line 1235, in ConstructFPropBPropGraph
    self._task.FPropDefaultTheta()
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_model.py", line 477, in FPropDefaultTheta
    return self.FProp(self.theta, input_batch)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_model.py", line 394, in FProp
    metrics, per_example = self._FPropSplitInputBatch(theta, input_batch)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_model.py", line 440, in _FPropSplitInputBatch
    metrics, per_example = self.FPropTower(theta_local, batch)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_model.py", line 363, in FPropTower
    predicted = self.ComputePredictions(theta, input_batch)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/tasks/asr/model.py", line 124, in ComputePredictions
    encoder_outputs = self._FrontendAndEncoderFProp(theta, input_batch_src)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/tasks/asr/model.py", line 156, in _FrontendAndEncoderFProp
    return self.encoder.FProp(theta.encoder, input_batch_src)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/tasks/asr/encoder.py", line 314, in FProp
    paddings)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/spectrum_augmenter.py", line 316, in FProp
    inputs, paddings)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/spectrum_augmenter.py", line 284, in _AugmentationNetwork
    dtype=dtype)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/spectrum_augmenter.py", line 212, in _TimeMask
    'bxyc,bx->bxyc', inputs, block_arrays, name='einsum_formasking')
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/special_math_ops.py", line 289, in einsum
    axes_to_sum)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/special_math_ops.py", line 417, in _einsum_reduction
    if len(t0_axis_labels) != len(t0.get_shape()):
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_shape.py", line 825, in __len__
    raise ValueError("Cannot take the length of shape with unknown rank.")
ValueError: Cannot take the length of shape with unknown rank.
cahuja1992 commented 5 years ago

I use this:

line312    # _, series_length, num_freq, _ = py_utils.GetShape(inputs)
line313    series_length = inputs[1]
line314    num_freq = inputs[2]
line315    augmented_inputs = self._AugmentationNetwork(series_length, num_freq,
line316                                                 inputs, paddings)

However, it occurs:

Traceback (most recent call last):
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 1557, in <module>
    tf.app.run(main)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 1553, in main
    RunnerManager(FLAGS.model).Start()
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 1546, in Start
    self.StartRunners(self.CreateRunners(FLAGS.job.split(','), FLAGS.logdir))
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 1314, in CreateRunners
    trial)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 1268, in _CreateRunner
    return self.Controller(cfg, *common_args)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 196, in __init__
    self._model.ConstructFPropBPropGraph()
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_model.py", line 1235, in ConstructFPropBPropGraph
    self._task.FPropDefaultTheta()
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_model.py", line 477, in FPropDefaultTheta
    return self.FProp(self.theta, input_batch)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_model.py", line 394, in FProp
    metrics, per_example = self._FPropSplitInputBatch(theta, input_batch)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_model.py", line 440, in _FPropSplitInputBatch
    metrics, per_example = self.FPropTower(theta_local, batch)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_model.py", line 363, in FPropTower
    predicted = self.ComputePredictions(theta, input_batch)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/tasks/asr/model.py", line 124, in ComputePredictions
    encoder_outputs = self._FrontendAndEncoderFProp(theta, input_batch_src)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/tasks/asr/model.py", line 156, in _FrontendAndEncoderFProp
    return self.encoder.FProp(theta.encoder, input_batch_src)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/tasks/asr/encoder.py", line 314, in FProp
    paddings)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/spectrum_augmenter.py", line 316, in FProp
    inputs, paddings)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/spectrum_augmenter.py", line 284, in _AugmentationNetwork
    dtype=dtype)
  File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/spectrum_augmenter.py", line 212, in _TimeMask
    'bxyc,bx->bxyc', inputs, block_arrays, name='einsum_formasking')
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/special_math_ops.py", line 289, in einsum
    axes_to_sum)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/special_math_ops.py", line 417, in _einsum_reduction
    if len(t0_axis_labels) != len(t0.get_shape()):
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/tensor_shape.py", line 825, in __len__
    raise ValueError("Cannot take the length of shape with unknown rank.")
ValueError: Cannot take the length of shape with unknown rank.

Same error, as above. Any solution yet ?

zh794390558 commented 5 years ago

watch this.