google / trax

Trax — Deep Learning with Clear Code and Speed
Apache License 2.0
8.07k stars 814 forks source link

TFDS From Master Branch Raises NoneType Not Subscriptable Error #1519

Closed necromuralist closed 3 years ago

necromuralist commented 3 years ago

Description

This isn't from the current release on pip, but on February 11 a change was made to master that causes TFDS to crash with a "NoneType is not subscriptable" error on my computer.

In trax.trax.data.tf_inputs.TFDS there are these lines:

  host_id = jax.host_id() if host_id is None else host_id
  n_hosts = n_hosts or jax.host_count()
  if n_hosts > 1:
    subsplit = (host_id / n_hosts, (host_id + 1) / n_hosts)
  else:
    subsplit = None

On my computer n_hosts = 1, so subsplit is None which gets passed to the _train_and_eval_dataset function and inside that function are these lines:

  if eval_holdout_examples > 0 or subsplit is not None:
    n_train = train_examples - eval_holdout_examples
    train_start = int(n_train * subsplit[0])
    train_end = int(n_train * subsplit[1])

because the conditional has an or and the eval_holdout_examples is greater than 0, the conditional gets past even though subsplit is None, so the attempt to subscript it subsplit[0] raises an exception.

I don't know if now is the time to report this, since I'm pulling from master (reverting to the last February 10 commit it fixes it for me) but I thought it might be helpful to know if it's not already.

Environment information

OS: Ubuntu 20.04 (using the nvidia docker container)

$ pip freeze | grep trax
-e trax==1.3.7

$ pip freeze | grep tensor
mesh-tensorflow==0.1.18
tensorboard==2.4.1
tensorboard-plugin-wit==1.8.0
tensorflow==2.4.1
tensorflow-datasets==4.2.0
tensorflow-estimator==2.4.0
tensorflow-hub==0.11.0
tensorflow-metadata==0.28.0
tensorflow-text==2.4.3

$ pip freeze | grep jax
jax==0.2.10
jaxlib==0.1.61+cuda111

$ python -V
Python 3.8.5

For bugs: reproduction and error logs

Steps to reproduce:

import trax
path = "data"
data_set = "opus/medical"
train_stream_fn = trax.data.TFDS(data_set,
                                 data_dir=path,
                                 keys=('en', 'de'),
                                 eval_holdout_size=0.01,
                                 train=True)

Error logs:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-fb62d04026f5> in <module>
      4 # data_set = "para_crawl/ende"
      5 
----> 6 train_stream_fn = trax.data.TFDS(data_set,
      7                                  data_dir=path,
      8                                  keys=('en', 'de'),

/usr/local/lib/python3.8/dist-packages/gin/config.py in gin_wrapper(*args, **kwargs)
   1067       scope_info = " in scope '{}'".format(scope_str) if scope_str else ''
   1068       err_str = err_str.format(name, fn_or_cls, scope_info)
-> 1069       utils.augment_exception_message_and_reraise(e, err_str)
   1070 
   1071   return gin_wrapper

/usr/local/lib/python3.8/dist-packages/gin/utils.py in augment_exception_message_and_reraise(exception, message)
     39   proxy = ExceptionProxy()
     40   ExceptionProxy.__qualname__ = type(exception).__qualname__
---> 41   raise proxy.with_traceback(exception.__traceback__) from None
     42 
     43 

/usr/local/lib/python3.8/dist-packages/gin/config.py in gin_wrapper(*args, **kwargs)
   1044 
   1045     try:
-> 1046       return fn(*new_args, **new_kwargs)
   1047     except Exception as e:  # pylint: disable=broad-except
   1048       err_str = ''

/usr/local/lib/python3.8/dist-packages/gin/config.py in gin_wrapper(*args, **kwargs)
   1067       scope_info = " in scope '{}'".format(scope_str) if scope_str else ''
   1068       err_str = err_str.format(name, fn_or_cls, scope_info)
-> 1069       utils.augment_exception_message_and_reraise(e, err_str)
   1070 
   1071   return gin_wrapper

/usr/local/lib/python3.8/dist-packages/gin/utils.py in augment_exception_message_and_reraise(exception, message)
     39   proxy = ExceptionProxy()
     40   ExceptionProxy.__qualname__ = type(exception).__qualname__
---> 41   raise proxy.with_traceback(exception.__traceback__) from None
     42 
     43 

/usr/local/lib/python3.8/dist-packages/gin/config.py in gin_wrapper(*args, **kwargs)
   1044 
   1045     try:
-> 1046       return fn(*new_args, **new_kwargs)
   1047     except Exception as e:  # pylint: disable=broad-except
   1048       err_str = ''

~/trax/trax/data/tf_inputs.py in TFDS(dataset_name, data_dir, tfds_preprocess_fn, keys, train, shuffle_train, host_id, n_hosts, eval_holdout_size)
    279   else:
    280     subsplit = None
--> 281   (train_data, eval_data, _) = _train_and_eval_dataset(
    282       dataset_name, data_dir, eval_holdout_size,
    283       train_shuffle_files=shuffle_train, subsplit=subsplit)

~/trax/trax/data/tf_inputs.py in _train_and_eval_dataset(dataset_name, data_dir, eval_holdout_size, train_shuffle_files, eval_shuffle_files, subsplit)
    224   if eval_holdout_examples > 0 or subsplit is not None:
    225     n_train = train_examples - eval_holdout_examples
--> 226     train_start = int(n_train * subsplit[0])
    227     train_end = int(n_train * subsplit[1])
    228     if train_end - train_start < 1:

TypeError: 'NoneType' object is not subscriptable
  In call to configurable 'TFDS' (<function TFDS at 0x7f960c527280>)
  In call to configurable 'TFDS' (<function TFDS at 0x7f960c526f70>)
JonathanBechtel commented 3 years ago

I had a similar problem and specifying the labels in keys and train fixed it. No idea why.

albangabillon commented 3 years ago

Same problem

@JonathanBechtel: I do not understand your solution

LDrago27 commented 3 years ago

I also have this same issue. @JonathanBechtel can you explain your solution.

vico commented 3 years ago

I modified the trax/data/tf_inputs.py file to make it work as follows: (I wonder if there is a work-around without modifying the Trax code)

https://github.com/google/trax/compare/master...vico:fixed_1519

afrozenator commented 3 years ago

Sorry for the breakage, hopefully https://github.com/google/trax/pull/1641 should fix it.

vico commented 3 years ago

hi @afrozenator , thank you very much for the fix! It got merged today and I did a check with my sample:

import sys
sys.path.insert(0, ".")
import trax
train_stream_fn = trax.data.TFDS('para_crawl/ende', #  'opus/medical',
                                 data_dir='./data/',
                                 keys=('en', 'de'),
                                 eval_holdout_size=0.01, # 1% for eval
                                 train=True)
train_stream = train_stream_fn()
print(next(train_stream))

I got the following error:

Traceback (most recent call last):
  File "/home/cuong/localdev/trax/trax/examples/bug_check.py", line 12, in <module>
    train=True)
  File "/home/cuong/localdev/trax/venv/lib/python3.7/site-packages/gin/config.py", line 1069, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/cuong/localdev/trax/venv/lib/python3.7/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/cuong/localdev/trax/venv/lib/python3.7/site-packages/gin/config.py", line 1046, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/home/cuong/localdev/trax/trax/data/tf_inputs.py", line 329, in TFDS
    subsplit=subsplit))
  File "/home/cuong/localdev/trax/trax/data/tf_inputs.py", line 253, in _train_and_eval_dataset
    raise ValueError('We require a validation or test split in the dataset.')
ValueError: We require a validation or test split in the dataset.
  In call to configurable 'TFDS' (<function TFDS at 0x7f306252a7a0>)

Can you check it on your side or it only occur on my environment?

afrozenator commented 3 years ago

Thanks @vico - I see the bug, it should hopefully be fixed in https://github.com/google/trax/pull/1644 -- Was this always a problem for you? Or did it start recently?

vico commented 3 years ago

Thanks @afrozenator , it's started recently. I followed this issue (#1519) and kept tracking it. Luckily you have fixed it so I just wanted to check if it runs on my local machine as well and then found the above error.

vico commented 3 years ago

I run the example against master branch and it works now. Thank you @afrozenator !

afrozenator commented 3 years ago

Thanks all, closing this bug now, feel free to reopen if needed -- 1.3.9 has been pushed out.