tensorflow / models

Models and examples built with TensorFlow
Other
77.18k stars 45.76k forks source link

Seq flow lite does not build #10737

Open grofte opened 2 years ago

grofte commented 2 years ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

1. The entire URL of the file you are using

https://github.com/tensorflow/models/blob/master/research/seq_flow_lite/demo/colab/emotion_colab.ipynb

https://github.com/tensorflow/models/tree/master/research/seq_flow_lite#readme

2. Describe the bug

There are three ways to launch a demo here and none of them work (won't build).

  1. Emotion colab demo fails on the line !pip install models/research/seq_flow_lite - maybe because colab uses Python 3.7?
  2. From the readme, the command bazel run -c opt :trainer fails to build in a tensorflow/tensorflow:2.3.4-gpu docker image where I installed bazel
  3. From the readme, the command bazel run -c opt sgnn:train does not run since it's not defined in the BUILD file - fair enough

3. Steps to reproduce

  1. Run the colab.

  2. & 3.

    git clone https://github.com/tensorflow/models.git
    docker run --rm=true -v $(pwd)/models:/home/models -it tensorflow/tensorflow:2.3.4-gpu bash
    apt install apt-transport-https curl gnupg
    curl -fsSL https://bazel.build/bazel-release.pub.gpg | gpg --dearmor >bazel-archive-keyring.gpg
    mv bazel-archive-keyring.gpg /usr/share/keyrings
    echo "deb [arch=amd64 signed-by=/usr/share/keyrings/bazel-archive-keyring.gpg] https://storage.googleapis.com/bazel-apt stable jdk1.8" | tee /etc/apt/sources.list.d/bazel.list
    apt update 
    apt install bazel
    cd /home/models/research/seq_flow_lite
    bazel run -c opt :trainer -- --config_path=$(pwd)/configs/civil_comments_prado.txt --runner_mode=train --logtostderr --output_dir=/tmp/prado
    bazel run -c opt sgnn:train -- --logtostderr --output_dir=/tmp/sgnn

4. Expected behavior

The thing is that I don't even care about PRADO or sgnn. I just wanted to try one of the new models you advertised on the Google AI blog (https://ai.googleblog.com/2022/08/efficient-sequence-modeling-for-on.html) but I can't because it needs the file tf_custom_ops_py.py but it doesn't exist. However tf_custom_ops.cc does exist so it seemed likely that if I could get build the Python version from C++ file. But nothing builds. There are no instructions on how to build and none of the demos seem to work.

5. Additional context

  1. Error message from the colab instance:
    Building wheels for collected packages: seq-flow-lite
    Building wheel for seq-flow-lite (setup.py) ... error
    ERROR: Failed building wheel for seq-flow-lite
    Running setup.py clean for seq-flow-lite
    Failed to build seq-flow-lite
    Installing collected packages: seq-flow-lite
    Running setup.py install for seq-flow-lite ... error
    ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-cr_5bdz5/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-cr_5bdz5/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-sxy3_k0l/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.7/seq-flow-lite Check the logs for full command output.

6. System information

Paduk commented 2 years ago

i faced same issue, thanks for reporting issue. does anyone not faced this issue? :)

karunreddy30 commented 2 years ago

Thanks for reporting! I will check it out and get back.

grofte commented 2 years ago

It would be nice with a Hugging Face model like there is for ByT5. But I am sure that if you approached the Tensorflow Hub team they would dedicate resources to making your work available to a wider audience.

pyoung2778 commented 2 years ago

Looks like the issue was that the version of the TensorFlow that we're pointing to in our WORKSPACE file is 2.6.0, and it's causing problems because it's probably 2.9.1 that's installed on the colab machine.

We've got an update, we just haven't gotten around to propagating the changes to Github. I'll see about getting that done and verifying that fixes things.

pyoung2778 commented 2 years ago

As a quick update, syncing the code did fix that problem, but also revealed another bug in the dataset building section. Will continue to investigate.

pyoung2778 commented 2 years ago

GoEmotions colab should be working again.

sushreebarsa commented 2 years ago

@grofte Could you refer to the comments above and try with the latest TF version. Please let us know if it helps? Thank you!

google-ml-butler[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

grofte commented 2 years ago

The Colaboratory absolutely works, thank you.

But that seems like a bit much to maintain which is why I asked for a docker image. Hopefully the Colab notebook won't need maintenance.

grofte commented 2 years ago

I don't understand why you are talking about Tensorflow Model Garden but then install tensorflow-datasets?

I don't understand why you are installing the nightly version of tensorflow-datasets? The current nightly doesn't even have goemotions. Maybe it will tomorrow but why on earth are you using the nightly version? Just get the latest with !pip install tensorflow-datasets. If you have already installed the nightly then you need to !pip uninstall tfds-nightly.

pyoung2778 commented 2 years ago

I think the GoEmotions dataset may not have been available in tensorflow-datasets when we started writing the colab, and we never got around to updating the package we were using after it was available in tensorflow-datasets.

I have no idea why we're calling it the Tensorflow Model Garden package, though.

Either way, I'll update the colab. Thanks.

grofte commented 2 years ago

Sorry for my tone. I was very tired yesterday afternoon.

grofte commented 2 years ago

I wanted to do a small re-write of the notebook for my own sake anyway so maybe I should submit that?

I am thinking that I will look at

  1. Fixing the TF Datasets stuff
  2. Add TF Addons so we can use the F1 metric function from them
  3. Add metrics to the models when built for training
  4. Split the preprocessing up for labels and features - makes it more intuitive what parts will be included in the model when we build it for inference
  5. Preprocess before batching - I think that will make it easier to read and modify for others (maybe not)
  6. Explicitly pass variables to build_dataset function unless they are imported from somewhere else - I know this is "just" a notebook but I think it's especially important to be explicit with these things in notebooks
  7. Some quick plots of the training history
grofte commented 2 years ago

I've made another version of the notebook. I haven't changed anything major, I've simply made it so each epoch only reads the data once (instead of 25 times as before so epochs are 250 instead of 10), learning rate is printed each epoch, I've added f1 as a metric, I plot loss and metrics versus epoch after training, labels and features are preprocessed in separate functions, functions have all arguments passed to them, model is compiled with jit_compile (about 25% faster).

I haven't optimized any hyperparameters even though it is clear that the model is under-parameterized and either converges long before the 250 epochs or stops improving when learning rate gets low enough.

Should I make a pull request with these changes?

grofte commented 2 years ago

I've also gotten Charformer to run and perform on par with BERT despite a) not being pretrained, b) only having 270k parameters, c) having been set up with extremely random hyperparameters since I needed to add a bunch but I have no idea what they do.

However, the model does not work when I load it after saving it, i.e. the predictions are nonsense. I've tried a couple of different ways and it might be the tokenizer but I don't know.

arefeh-k commented 1 year ago

Hello

Im receiving the same error on Google Colab for goemotions. I was working with go emotions for one month but its two days that it doesn't work and get this error:

ERROR: Failed building wheel for seq-flow-lite Running setup.py clean for seq-flow-lite Failed to build seq-flow-lite

any help is appreciated

arefeh-k commented 1 year ago

Hi @pyoung2778 The goemotions colab was working fine but from 3 days ago, it doesn't work. I'm getting this error, could you help, please?

ERROR: Failed building wheel for seq-flow-lite Running setup.py clean for seq-flow-lite Failed to build seq-flow-lite

grofte commented 1 year ago

Yeah, it doesn't build at all. I am thinking that it's a version change in bazel or pip since the code here doesn't seem to have changed.

pyoung2778 commented 1 year ago

Finally got a chance to look at this. It looks to be a bazel change. bazel 6.0.0 is not backwards compatible, and @bazel_tools//platforms constraints are now giving us some problems.

I'm looking into it.

pyoung2778 commented 1 year ago

Okay. I suspect the actual fix buried deeper that I'm going to be able to fix.

In the mean time, changing the line "!sudo apt install bazel" to "!sudo apt install bazel=5.4.0" installs a version of bazel that is compatible with everything.

I still need to run a couple checks to make sure everything's working, but in the meantime, you should be able to modify that line in the colab and get everything working again.

pyoung2778 commented 1 year ago

Okay, should be fixed.