petewarden / tensorflow_makefile

Apache License 2.0
69 stars 114 forks source link

py_func Crashes #9

Closed altostratous closed 3 years ago

altostratous commented 3 years ago

Environment info

Operating System: NAME="Ubuntu" VERSION="20.04.1 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.1 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=focal UBUNTU_CODENAME=focal

Installed version of CUDA and cuDNN: None

(please attach the output of ls -l /path/to/cuda/lib/libcud*):

(base) ali@simon:/tmp/mozilla_ali0$ ls -l /path/to/cuda/lib/libcud*
ls: cannot access '/path/to/cuda/lib/libcud*': No such file or directory

If installed from binary pip package, provide:

  1. Which pip package you installed.
  2. The output from python -c "import tensorflow; print(tensorflow.version)".

If installed from sources, provide the commit hash: 11b328425e5c4a0c2852aea9db5a61fbc7aa290c

Steps to reproduce

  1. Instantiate ResNet50 with keras.
  2. Load TensorFI on it.
  3. Run prediction with fault injections enabled.

The code is bellow:

from tensorflow.keras.applications.resnet50 import ResNet50
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions
import numpy as np
import TensorFI as fi
from tensorflow.keras.backend import get_session

model = ResNet50(weights='imagenet')

img_path = 'val_5.JPEG'
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

session = get_session()

tf = fi.TensorFI(session, disableInjections=False, logLevel=50)

preds = session.run(model.outputs[0], feed_dict={model.inputs[0]: x})

# preds = model.predict(x)

# decode the results into a list of tuples (class, description, probability)
# (one such list for each sample in the batch)
print('Predicted:', decode_predictions(preds, top=3)[0])
# Predicted: [(u'n02504013', u'Indian_elephant', 0.82658225), (u'n01871265', u'tusker', 0.1122357), (u'n02504458', u'African_elephant', 0.061040461)]

here is the input image used: val_5

when I turn off the injections I get the expected output:

 ('Predicted:', [(u'n04399382', u'teddy', 0.81401235), (u'n02105641', u'Old_English_sheepdog', 0.032959767), (u'n04008634', u'projectile', 0.020169798)])

What have you tried?

  1. tracing the code which ends in some c execution and terminates by a check in py_func.cc

Logs or other output that would be helpful

(If logs are large, please upload as attachment).

/home/ali/anaconda/envs/tensorfi/bin/python /home/ali/Desktop/Code/TensorFI/resnet50/model.py
WARNING:tensorflow:From /home/ali/Desktop/Code/TensorFI/resnet50/model.py:6: The name tf.keras.backend.get_session is deprecated. Please use tf.compat.v1.keras.backend.get_session instead.

WARNING:tensorflow:From /home/ali/anaconda/envs/tensorfi/lib/python2.7/site-packages/tensorflow/python/ops/init_ops.py:1251: calling __init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
2021-02-05 18:53:34.853793: I tensorflow/core/platform/cpu_feature_guard.cc:145] This TensorFlow binary is optimized with Intel(R) MKL-DNN to use the following CPU instructions in performance critical operations:  SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in non-MKL-DNN operations, rebuild TensorFlow with the appropriate compiler flags.
2021-02-05 18:53:34.881342: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2394305000 Hz
2021-02-05 18:53:34.881859: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5582bd2a2eb0 executing computations on platform Host. Devices:
2021-02-05 18:53:34.881909: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-3
OMP: Info #156: KMP_AFFINITY: 4 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 2 cores/pkg x 2 threads/core (2 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
OMP: Info #250: KMP_AFFINITY: pid 90837 tid 90837 thread 0 bound to OS proc set 0
2021-02-05 18:53:34.882399: I tensorflow/core/common_runtime/process_util.cc:115] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2021-02-05 18:53:35.374807: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
/home/ali/Desktop/Code/TensorFI/TensorFI/fiConfig.py:270: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  params = yaml.load(pStream)
Unable to open log file faultLogs/NoName-log
Starting log at 2021-02-05 18:53:40.952907

---------------------------------------
2021-02-05 18:53:43.067374: F tensorflow/python/lib/core/py_func.cc:466] Check failed: DataTypeCanUseMemcpy(t.dtype()) 

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)
altostratous commented 3 years ago

Sorry I submitted the issue on wrong repository