Open Santosh-Gupta opened 5 years ago
Update: I tried forking my own repo and added a file to the 'data_generators' just for my problem, here
https://github.com/Santosh-Gupta/tensor2tensor/blob/master/tensor2tensor/data_generators/sci_sum.py
But the new registered problem is still not recognised.
Here is the notebook that tried to get it to work
https://colab.research.google.com/drive/1Fof2vr-gjuDuz3cHG2ILnlRE0f8A-U-Q
For convenience, here is the code
!pip install -q -U https://github.com/Santosh-Gupta/tensor2tensor/archive/master.zip
!pip install -q tensorflow matplotlib
!pip install fastparquet
import tensorflow as tf
import os
Modes = tf.estimator.ModeKeys
# Setup some directories
data_dir = os.path.expanduser("~/t2t/data")
tmp_dir = os.path.expanduser("~/t2t/tmp")
train_dir = os.path.expanduser("~/t2t/train")
output_dir = os.path.expanduser("~/t2t/output")
checkpoint_dir = os.path.expanduser("~/t2t/checkpoints")
tf.gfile.MakeDirs(data_dir)
tf.gfile.MakeDirs(tmp_dir)
tf.gfile.MakeDirs(train_dir)
tf.gfile.MakeDirs(checkpoint_dir)
gs_data_dir = "gs://tensor2tensor-data"
gs_ckpt_dir = "gs://tensor2tensor-checkpoints/"
!t2t-datagen \
--data_dir=~/t2t/data \
--tmp_dir=~/t2t/tmp \
--problem=summarize_scientific_sections65k
Here is the beginning of the resulting error message.
WARNING: Logging before flag parsing goes to stderr.
W0715 16:28:04.981967 140038605891456 lazy_loader.py:50]
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
* https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
* https://github.com/tensorflow/addons
* https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.
W0715 16:28:06.853463 140038605891456 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/expert_utils.py:68: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
W0715 16:28:09.387610 140038605891456 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/rl/gym_utils.py:235: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.
W0715 16:28:09.392353 140038605891456 deprecation_wrapper.py:119] From /usr/local/bin/t2t-datagen:27: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.
W0715 16:28:09.392554 140038605891456 deprecation_wrapper.py:119] From /usr/local/bin/t2t-datagen:27: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.
W0715 16:28:09.392674 140038605891456 deprecation_wrapper.py:119] From /usr/local/bin/t2t-datagen:28: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.
Traceback (most recent call last):
File "/usr/local/bin/t2t-datagen", line 28, in <module>
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "/usr/local/bin/t2t-datagen", line 23, in main
t2t_datagen.main(argv)
File "/usr/local/lib/python3.6/dist-packages/tensor2tensor/bin/t2t_datagen.py", line 196, in main
raise ValueError(error_msg)
ValueError: You must specify one of the supported problems to generate data for:
* algorithmic_addition_binary40
* algorithmic_addition_decimal40
* algorithmic_algebra_inverse
And here is the end of the error message
* winograd_nli
* winograd_nli_characters
* wsj_parsing
TIMIT and parsing need data_sets specified with --timit_paths and --parsing_path.
Update:
The forked repo now works. It looks like you have to include the new file in the all_problems.py file in data_generators.
I am still curious if it's possible to register a problem directly in Colab. This would be very convenient for training over data in multiple files/directories in google drive.
It looks like this colab notebook may give some insight
It looks like %%writefile poetry/trainer/problem.py
lets you create a new problem.py file
then you can import that file using from . import problem
and this may overwrite the previously imported problem?
Description
In Google colab I registered a new problem. I then want to run training on that problem, but as far as I can see, the only way to run training is through the command line. But when I try that, it does says that problem problem is not registered , even though the colab enviroment says it is.
Environment information
Here is the colab notebook I am using
https://colab.research.google.com/drive/1yEU-K-3Und2aCdOHMbHiePMeGNUK7pFe
Part that shows my new problem is registered (it's named 'summarize_scientific_sections65k')
https://snag.gy/Xja80p.jpg
Part that shows that the command line data generation does not have my problem registered
https://snag.gy/YI5BiL.jpg
For bugs: reproduction and error logs
see colab notebook
Here is the begging of the error log (since it lists all the problems in the registry )