Is it possible to register a new problem directly in Google Colab?

Santosh-Gupta commented 5 years ago

Description

In Google colab I registered a new problem. I then want to run training on that problem, but as far as I can see, the only way to run training is through the command line. But when I try that, it does says that problem problem is not registered , even though the colab enviroment says it is.

Environment information

Here is the colab notebook I am using

https://colab.research.google.com/drive/1yEU-K-3Und2aCdOHMbHiePMeGNUK7pFe

Part that shows my new problem is registered (it's named 'summarize_scientific_sections65k')

https://snag.gy/Xja80p.jpg

Part that shows that the command line data generation does not have my problem registered

https://snag.gy/YI5BiL.jpg

For bugs: reproduction and error logs

# Steps to reproduce:
...

see colab notebook

https://colab.research.google.com/drive/1yEU-K-3Und2aCdOHMbHiePMeGNUK7pFe

# Error logs:
...

Here is the begging of the error log (since it lists all the problems in the registry )

Traceback (most recent call last):
  File "/usr/local/bin/t2t-datagen", line 28, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/usr/local/bin/t2t-datagen", line 23, in main
    t2t_datagen.main(argv)
  File "/usr/local/lib/python3.6/dist-packages/tensor2tensor/bin/t2t_datagen.py", line 196, in main
    raise ValueError(error_msg)
ValueError: You must specify one of the supported problems to generate data for:
  * algorithmic_addition_binary40
  * algorithmic_addition_decimal40

Santosh-Gupta commented 5 years ago

Update: I tried forking my own repo and added a file to the 'data_generators' just for my problem, here

https://github.com/Santosh-Gupta/tensor2tensor/blob/master/tensor2tensor/data_generators/sci_sum.py

But the new registered problem is still not recognised.

Here is the notebook that tried to get it to work

https://colab.research.google.com/drive/1Fof2vr-gjuDuz3cHG2ILnlRE0f8A-U-Q

For convenience, here is the code

!pip install -q -U https://github.com/Santosh-Gupta/tensor2tensor/archive/master.zip
!pip install -q tensorflow matplotlib
!pip install fastparquet

import tensorflow as tf
import os
Modes = tf.estimator.ModeKeys

# Setup some directories
data_dir = os.path.expanduser("~/t2t/data")
tmp_dir = os.path.expanduser("~/t2t/tmp")
train_dir = os.path.expanduser("~/t2t/train")
output_dir = os.path.expanduser("~/t2t/output")
checkpoint_dir = os.path.expanduser("~/t2t/checkpoints")
tf.gfile.MakeDirs(data_dir)
tf.gfile.MakeDirs(tmp_dir)
tf.gfile.MakeDirs(train_dir)
tf.gfile.MakeDirs(checkpoint_dir)
gs_data_dir = "gs://tensor2tensor-data"
gs_ckpt_dir = "gs://tensor2tensor-checkpoints/"

!t2t-datagen \
  --data_dir=~/t2t/data \
  --tmp_dir=~/t2t/tmp \
  --problem=summarize_scientific_sections65k

Here is the beginning of the resulting error message.

WARNING: Logging before flag parsing goes to stderr.
W0715 16:28:04.981967 140038605891456 lazy_loader.py:50] 
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W0715 16:28:06.853463 140038605891456 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/expert_utils.py:68: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

W0715 16:28:09.387610 140038605891456 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/rl/gym_utils.py:235: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.

W0715 16:28:09.392353 140038605891456 deprecation_wrapper.py:119] From /usr/local/bin/t2t-datagen:27: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

W0715 16:28:09.392554 140038605891456 deprecation_wrapper.py:119] From /usr/local/bin/t2t-datagen:27: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

W0715 16:28:09.392674 140038605891456 deprecation_wrapper.py:119] From /usr/local/bin/t2t-datagen:28: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.

Traceback (most recent call last):
  File "/usr/local/bin/t2t-datagen", line 28, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/usr/local/bin/t2t-datagen", line 23, in main
    t2t_datagen.main(argv)
  File "/usr/local/lib/python3.6/dist-packages/tensor2tensor/bin/t2t_datagen.py", line 196, in main
    raise ValueError(error_msg)
ValueError: You must specify one of the supported problems to generate data for:
  * algorithmic_addition_binary40
  * algorithmic_addition_decimal40
  * algorithmic_algebra_inverse

And here is the end of the error message

  * winograd_nli
  * winograd_nli_characters
  * wsj_parsing
TIMIT and parsing need data_sets specified with --timit_paths and --parsing_path.

Santosh-Gupta commented 5 years ago

Update:

The forked repo now works. It looks like you have to include the new file in the all_problems.py file in data_generators.

I am still curious if it's possible to register a problem directly in Colab. This would be very convenient for training over data in multiple files/directories in google drive.

Santosh-Gupta commented 5 years ago

It looks like this colab notebook may give some insight

https://colab.research.google.com/github/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/deepdive/09_sequence/poetry.ipynb#scrollTo=UN7aHW57T7AZ

It looks like %%writefile poetry/trainer/problem.py lets you create a new problem.py file

then you can import that file using from . import problem

and this may overwrite the previously imported problem?

tensorflow / tensor2tensor