pytorch / executorch

On-device AI across mobile, embedded and edge for PyTorch
https://pytorch.org/executorch/
Other
1.35k stars 218 forks source link

TypeError: 'staticmethod' object is not callable AND XnnpackPartitioner() flatc issue #2025

Open adonnini opened 3 months ago

adonnini commented 3 months ago

@mcr229 @kimishpatel asked me to create this issue and bring it to your attention.

I added strict=False to export(pre_autograd_aten_dialect, (enc_input, dec_input, dec_source_mask, dec_target_mask)) When I ran the code, it failed at this line lowered_module = edge_program.to_backend(XnnpackPartitioner()) Telling me it could not find flatc even though it is installed and available.

I am sorry I did not think to copy the traceback log.

Then, I had the bright idea (not really) to update executorch installation to the latest nightly build (20240209). Now, execution fails with a brand new error. Please see traceback log below. I searched for information on this error. I did not come up with anything I found useful, not to say that there wasn't relevant information. I just could not decipher whether it was applicable to my case TRACEBACK LOG

Traceback (most recent call last): File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/train-minimum.py", line 21, in from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/executorch/backends/xnnpack/init.py", line 8, in from .partition.xnnpack_partitioner import ( File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/executorch/backends/xnnpack/partition/xnnpack_partitioner.py", line 82, in class XnnpackOperatorSupport(OperatorSupportBase): File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/executorch/backends/xnnpack/partition/xnnpack_partitioner.py", line 236, in XnnpackOperatorSupport @_constraint(exir_ops.edge.aten.mean.dim) TypeError: 'staticmethod' object is not callable

mcr229 commented 3 months ago

Hi adonnini, does this error also occur when you try running through any of the examples?

For example

python -m examples.xnnpack.aot_compiler -m="mv2" -d
adonnini commented 3 months ago

@mcr229 I never ran any of the examples. They are not meaningful for me. Even if I tried running them and they worked, it doesn't change the fact that execution fails when I run the code with my model. The outcome of running the examples should not change whether i run them or anyone else does. Did you try to run the examples?

mcr229 commented 3 months ago

Hi @adonnini, the error you're encountering looks strange to me and seemed like it wasn't model specific, so I was wondering if you would encounter the same error if you ran the export of one of the example models. That's why I was asking if running any of the examples would cause the same error for you. I just wanted to see if this was a build specific issue or a model specific issue.

adonnini commented 3 months ago

@mcr229 I understand. It might be helpful if I gave you the sequence of events that led to the error: 1) When trying to lower my model onto an Android device I was running into torch._dynamo.exc.Unsupported: call_function args: error 2) To work around the error it was suggested that I set strict to False 3) Doing this seemed to solve the problem but caused a flatc missing/not found problem when running this line lowered_module = edge_program.to_backend(XnnpackPartitioner()) 4) To try and resolve this error, I updated executorch to the 20240209 nightly build 5) After the update, running my code failed with TypeError: 'staticmethod' object is not callable I hope this helps.

adonnini commented 3 months ago

@mcr229 I ran

python3 -m aot_compiler -m="mv2" -d

Execution failed producing the following traceback log

Traceback (most recent call last):
  File "/home/adonnini1/anaconda3/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/adonnini1/anaconda3/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/executorch/examples/xnnpack/aot_compiler.py", line 14, in <module>
    from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/executorch/backends/xnnpack/__init__.py", line 8, in <module>
    from .partition.xnnpack_partitioner import (
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/executorch/backends/xnnpack/partition/xnnpack_partitioner.py", line 82, in <module>
    class XnnpackOperatorSupport(OperatorSupportBase):
  File "/home/adonnini1/anaconda3/lib/python3.9/site-packages/executorch/backends/xnnpack/partition/xnnpack_partitioner.py", line 236, in XnnpackOperatorSupport
    @_constraint(exir_ops.edge.aten.mean.dim)
TypeError: 'staticmethod' object is not callable
mcr229 commented 3 months ago

hi adonnini, it looks like the python version you're using is python 3.9. I believe we currently require 3.10+. I moved my python version to 3.9 as well and was able to replicate the error. Try creating a new conda env and installing python=3.10

adonnini commented 3 months ago

@mcr229 are you saying that that using python 3.9 is the cause of the error?

mcr229 commented 3 months ago

yes I believe so

adonnini commented 3 months ago

@mcr229 running: /usr/bin/python3.11 train-minimum.py instead of running the model code with python 3.9 fails with this traceback log


Traceback (most recent call last):
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/train-minimum.py", line 5, in <module>
    import torch
ModuleNotFoundError: No module named 'torch'

please note that python3 train-minimum.py where python3 corresponds to python 3.9 never failed like this.

I tried to rebuild completely executorch. When using python 3.9 it worked flawlessly. Now with /usr/bin/python3.11 -m venv .executorch ./install_requirements.sh fails with the error reported below. In addition, when attempting to run zstd -cdq /home/adonnini1/Downloads/buck2-x86_64-unknown-linux-musl.zst > /tmp/buck2 && chmod +x /tmp/buck2 the result is bash: /tmp/buck2: Text file busy

Evertyhing is pretty much a mess. Executorch installation/update previously ran flawlessly. It looks like the solution to my problem is further and further away. I am pretty disappointed.

./install_requirements.sh ERROR

  Preparing metadata (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [18 lines of output]
      /home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/executorch/.executorch/lib/python3.11/site-packages/setuptools/config/pyprojecttoml.py:108: _BetaConfiguration: Support for `[tool.setuptools]` in `pyproject.toml` is still *beta*.
        warnings.warn(msg, _BetaConfiguration)
      running dist_info
      copying from schema/scalar_type.fbs to exir/_serialize/scalar_type.fbs
      copying from schema/program.fbs to exir/_serialize/program.fbs
      copying from sdk/bundled_program/schema/bundled_program_schema.fbs to sdk/bundled_program/serialize/bundled_program_schema.fbs
      copying from sdk/bundled_program/schema/scalar_type.fbs to sdk/bundled_program/serialize/scalar_type.fbs
      creating /tmp/pip-modern-metadata-93dx2739/executorch.egg-info
      writing /tmp/pip-modern-metadata-93dx2739/executorch.egg-info/PKG-INFO
      writing dependency_links to /tmp/pip-modern-metadata-93dx2739/executorch.egg-info/dependency_links.txt
      writing requirements to /tmp/pip-modern-metadata-93dx2739/executorch.egg-info/requires.txt
      writing top-level names to /tmp/pip-modern-metadata-93dx2739/executorch.egg-info/top_level.txt
      writing manifest file '/tmp/pip-modern-metadata-93dx2739/executorch.egg-info/SOURCES.txt'
      reading manifest file '/tmp/pip-modern-metadata-93dx2739/executorch.egg-info/SOURCES.txt'
      adding license file 'LICENSE'
      writing manifest file '/tmp/pip-modern-metadata-93dx2739/executorch.egg-info/SOURCES.txt'
      creating '/tmp/pip-modern-metadata-93dx2739/executorch-0.1.0.dist-info'
      error: invalid command 'bdist_wheel'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
bash: build/install_flatc.sh: No such file or directory
mcr229 commented 3 months ago

Hi Adonnini, sorry about your troubles here, I understand this is quite frustrating, and we are actively working on improving our documentations and setup workflow. Let me tag @dbort here so he can note these pain points for us to improve in the future.

For your issue, would it be possible to try something like this?

conda create -yn executorch_with_3.1 python=3.10.0
conda activate executorch_with_3.1
./install_requirements.sh
dbort commented 3 months ago

Sorry about the problems @adonnini. A few things:

adonnini commented 3 months ago

@mcr229 Thanks. I really appreciate your help. Iran the commands listed in your last comment above. Unfortunately, the result of ./install_requirements.sh

was the same. Did I do something wrong?

  Preparing metadata (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [18 lines of output]
      /home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/executorch/.executorch/lib/python3.11/site-packages/setuptools/config/pyprojecttoml.py:108: _BetaConfiguration: Support for `[tool.setuptools]` in `pyproject.toml` is still *beta*.
        warnings.warn(msg, _BetaConfiguration)
      running dist_info
      copying from schema/scalar_type.fbs to exir/_serialize/scalar_type.fbs
      copying from schema/program.fbs to exir/_serialize/program.fbs
      copying from sdk/bundled_program/schema/bundled_program_schema.fbs to sdk/bundled_program/serialize/bundled_program_schema.fbs
      copying from sdk/bundled_program/schema/scalar_type.fbs to sdk/bundled_program/serialize/scalar_type.fbs
      creating /tmp/pip-modern-metadata-52tm45wt/executorch.egg-info
      writing /tmp/pip-modern-metadata-52tm45wt/executorch.egg-info/PKG-INFO
      writing dependency_links to /tmp/pip-modern-metadata-52tm45wt/executorch.egg-info/dependency_links.txt
      writing requirements to /tmp/pip-modern-metadata-52tm45wt/executorch.egg-info/requires.txt
      writing top-level names to /tmp/pip-modern-metadata-52tm45wt/executorch.egg-info/top_level.txt
      writing manifest file '/tmp/pip-modern-metadata-52tm45wt/executorch.egg-info/SOURCES.txt'
      reading manifest file '/tmp/pip-modern-metadata-52tm45wt/executorch.egg-info/SOURCES.txt'
      adding license file 'LICENSE'
      writing manifest file '/tmp/pip-modern-metadata-52tm45wt/executorch.egg-info/SOURCES.txt'
      creating '/tmp/pip-modern-metadata-52tm45wt/executorch-0.1.0.dist-info'
      error: invalid command 'bdist_wheel'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Version of /home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/executorch/third-party/flatbuffers is 23.5.26
A compatible version of flatc is on the PATH and ready to use.
adonnini commented 3 months ago

@dbort , Thanks. I followed your suggestions after running the commands in @mcr229 latest message. Please see below. I did install the latest version of wheel. As you will see when I ran /tmp/buck2 build //examples/portable/executor_runner:executor_runner --show-output the build failed. Did I do something wrong?

(executorch_with_3.1) (.executorch) adonnini1@actlnxlptp6:~/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/executorch$ rm /tmp/buck2
(executorch_with_3.1) (.executorch) adonnini1@actlnxlptp6:~/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/executorch$ zstd -cdq /home/adonnini1/Downloads/buck2-x86_64-unknown-linux-musl.zst  > /tmp/buck2 && chmod +x /tmp/buck2
(executorch_with_3.1) (.executorch) adonnini1@actlnxlptp6:~/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/executorch$ /tmp/buck2 build //examples/portable/executor_runner:executor_runner --show-output
From `load` at implicit location

Caused by:
    0: From `load` at third-party/prelude/prelude.bzl:8:6-29
    1: From `load` at third-party/prelude/native.bzl:15:6-45
    2: From `load` at third-party/prelude/apple/apple_macro_layer.bzl:8:6-32
    3: Error evaluating module: `prelude//apple/apple_bundle_config.bzl`
    4: error: Variable `typing` not found
         --> third-party/prelude/apple/apple_bundle_config.bzl:14:40
          |
       14 | def apple_bundle_config() -> dict[str, typing.Any]:
          |                                        ^^^^^^
          |

Build ID: 0de68684-c15d-482c-a9e8-82f4017c5d82
Jobs completed: 3. Time elapsed: 0.0s.
BUILD FAILED
mcr229 commented 3 months ago

cc @shoumikhin as it looks like this seems to be an issue with some apple stuff

lucylq commented 3 months ago

Hey @adonnini, There was a recent change that updated the buck versions from 2023-07-18 to 2024-02-15. Can you try with the newer buck2?

See: https://github.com/pytorch/executorch/pull/2034

adonnini commented 3 months ago

@lucylq Thanks. I will do as you say.

@mcr229 Does fixing the buck2 issue solve my main problem, the failure of ./install_requirements.sh even after following your isntructions (using python 3.10) and the fact that lowering a model using (Xnnpack does not seem to be working?

I am sorry but it looks like at this point I am getting caught up in a number of other issues without necessarily getting any closer to solving my main problem. I know that sometimes this happens...

Thanks

mcr229 commented 3 months ago

Hi @adonnini are you still encountering the error: TypeError: 'staticmethod' object is not callable after updating to python 3.10? Again apologies for the difficulty with this set up. I am hopeful that updating buck and using python3.10 will move us closer.

adonnini commented 3 months ago

@mcr229 , unfortunately things have gotten significantly worse. I can't even get to the point where I can determine whether the TypeError: 'staticmethod' object is not callable still occurs.

As I indicated in comments above, after following your instructions for use of python3.10:

1) ./install_requirements.sh still fails (please see comments above for traceback log)

2) If I try to run my training module which includes the executorch code. It fails with this traceback:

Traceback (most recent call last):
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/train-minimum.py", line 12, in <module>
    import dataloader
  File "/home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/dataloader.py", line 8, in <module>
    import pandas as pd
ModuleNotFoundError: No module named 'pandas'

What do you think I should do next?

adonnini commented 3 months ago

@lucylq I did as you suggested now all tasks related to buck2 work. Thanks!

adonnini commented 3 months ago

@mcr229 now that the buck2 related issue is resolved the only "infrastructure" related issue left is the one related to the failure of ./install_requirements.sh By the way, I just ran these

conda create -yn executorch_with_3.1 python=3.10.0
conda activate executorch_with_3.1
./install_requirements.sh

Theresult was the same failure:

 Preparing metadata (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [15 lines of output]
      running dist_info
      copying from schema/scalar_type.fbs to exir/_serialize/scalar_type.fbs
      copying from schema/program.fbs to exir/_serialize/program.fbs
      copying from sdk/bundled_program/schema/bundled_program_schema.fbs to sdk/bundled_program/serialize/bundled_program_schema.fbs
      copying from sdk/bundled_program/schema/scalar_type.fbs to sdk/bundled_program/serialize/scalar_type.fbs
      creating /tmp/pip-modern-metadata-qtca28ux/UNKNOWN.egg-info
      writing /tmp/pip-modern-metadata-qtca28ux/UNKNOWN.egg-info/PKG-INFO
      writing dependency_links to /tmp/pip-modern-metadata-qtca28ux/UNKNOWN.egg-info/dependency_links.txt
      writing top-level names to /tmp/pip-modern-metadata-qtca28ux/UNKNOWN.egg-info/top_level.txt
      writing manifest file '/tmp/pip-modern-metadata-qtca28ux/UNKNOWN.egg-info/SOURCES.txt'
      reading manifest file '/tmp/pip-modern-metadata-qtca28ux/UNKNOWN.egg-info/SOURCES.txt'
      adding license file 'LICENSE'
      writing manifest file '/tmp/pip-modern-metadata-qtca28ux/UNKNOWN.egg-info/SOURCES.txt'
      creating '/tmp/pip-modern-metadata-qtca28ux/UNKNOWN.dist-info'
      error: invalid command 'bdist_wheel'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Version of /home/adonnini1/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/executorch/third-party/flatbuffers is 23.5.26
A compatible version of flatc is on the PATH and ready to use.
(executorch_with_3.1) (.executorch) adonnini1@actlnxlptp6:~/Development/ContextQSourceCode/NeuralNetworks/trajectory-prediction-transformers-master/executorch$ 
mcr229 commented 3 months ago

It looks like the issue might be with installing bdist_wheel. Could you try pip install wheel in your venv executorch_with_3.1. you might also have to run python setup.py bdist_wheel afterwards. Please let me know if this works.

adonnini commented 3 months ago

@mcr229 , thanks. Actually, it looks like bdist_wheel does not have to be installed (pip does not recognize any such package). However, I did need to run python setup.py bdist_wheel. After doing that, execution of ./install_requirements.sh completed successfully.

Next, I installed all the packages required for execution of my training module (including the executorch code).

Once I did that, execution of the training module (including the executorch code) completed successfully with the production of a model file (.pte).

So, I guess that this issue is resolved. Please let me know if you agree.

Thank you very much for your help.

adonnini commented 3 months ago

@mcr229 When I attempt to load the model produced with the training module (using the executorch code), execution fails when the app tries to load the model. Below, you will find the relevant portion of the error log. Should I open a new issue for this? Please let me know if you need additional information. Thanks

ERROR LOG ATTEMPTING TO LOAD LOWERED MODEL

02-28 16:07:12.089: I/ETLOG(8314): Model file /data/user/0/com.android.contextq/files/locationInformation/tpt_delegate.pte is loaded.
02-28 16:07:12.089: I/ETLOG(8314): Setting up planned buffer 0, size 31460272.
02-28 16:07:12.101: I/ETLOG(8314): Constant buffer 1 out of program buffer range 0
02-28 16:07:12.101: I/ETLOG(8314): getTensorDataPtr() failed: 0x12
02-28 16:07:12.101: I/ETLOG(8314): Failed parsing tensor at index 0: 0x12
02-28 16:07:12.101: I/ETLOG(8314): In function CheckOk(), assert failed: hasValue_
mcr229 commented 3 months ago

Yes, look like the original issue is resolved. Thank you so much for sticking through this despite all the issues arising. As for the new error log with executing the model, i would suggest opening a new issue for this.