ise-uiuc / FreeFuzz

Free Lunch for Testing: Fuzzing Deep-Learning Libraries from Open Source (ICSE'22)
72 stars 14 forks source link

tensorflow instrumentation bug #7

Open dmc1778 opened 1 year ago

dmc1778 commented 1 year ago

Hi,

When I want to instrument tensorflow APIs, I get weird results as shown in the attached image.

For example, I get:

tf.__main__.MatrixSolveOpTest

As you can see, __main__ and Test are in the API name which is wrong.

Also, all value spaces are same for all APIs.

Any Idea why it happens?

Thanks. gg

dmc1778 commented 1 year ago

I found the issues. The issue is that tensorflow APIs called via tensorflow unit tests have different API names compared to APIs listed in the official documentation. For example when you run:

python broadcast_to_ops_test.py

Your code hijacks the class name inside this test file which is BroadcastToTest and records main as the parent module due to this piece of code:

if __name__ == "__main__":
  test_lib.main()
dengyinlin commented 1 year ago

hi @nimashiri , I think there's some misunderstanding here and I am happy to help.

Based on your description, I am guessing you are trying to instrument the unit test class BroadcastToTest from broadcast_to_ops_test.py? Actually, our current implementation for TensorFlow instrumentation is only targetting public TensorFlow APIs, and one may expect errors when directly applying it to instrument other internal classes (e.g., the test utility classes you mentioned) due to various issues (e.g., the test_lib.main() you located, good catch!).

You can find the API list we used in api_list.txt, or the full API list on the latest version here.

In other words, to collect traces for public APIs by running tests, FreeFuzz does not instrument the test class like BroadcastToTest, it just instrument the public API and run the tests.

If you are seeking to instrument such non-public APIs for your use case, you may want to extend the current instrumentation and handle more special issues (like test_lib.main()). PRs are welcome!

dmc1778 commented 1 year ago

Thanks. Not actually, I am going to instrument public APIs inside tensorflow library tests. But, I don't know which public tests did you run to collect APIs? Because in the paper, you mentioned that you have collected around 216 APIs from tests. My question is that which tests? I don't know their locations. My assumption is that when I run kernel tests, the code should automatically instrument public APIs inside kernel tests or op tests. But, it gives me the results shown above.

Anjiang-Wei commented 1 year ago

But, I don't know which public tests did you run to collect APIs?

If I remember correctly, for pytorch we just directly run all the *.py files inside some folder for testing purposes (e.g., https://github.com/pytorch/pytorch/tree/master/test). I agree that our test collection is incomplete in that some test files need to be invoked in a specialized way (e.g., passing command line arguments? I am not sure), also I do not even consider tests written in C++ or other languages. Also, our testing environment is not the standard developer's environment, so actually a subset of the py test files that we ran failed (while they should not fail in the correct environment setting such as the CI), thus leaving a large proportion of code uncovered by existing tests.

I can't clearly remember how we run TensorFlow's test cases. We are sorry that our test case collection is done in an ad-hoc way. As you may have already noticed, the coverage in terms of percentage is actually quite low in terms of the total LOC in the code base.

Thanks again for these nice questions! Inspired by what you have said, I firmly believe FreeFuzz's methodology can actually be much stronger if the test collection stage can be more complete and robust, as the current test cases collection is done in a way that is far from ideal.

Challenge: 1) How to thoroughly consider all the test cases in an existing repository? You may want to see how they run their CI to test the whole software systems. 2) How to have a generic way to instrument the APIs as some test cases target internal APIs, and even internal C++ functions, not just public Python APIs. 3) The distinctions between "public APIs" and "private APIs / C++ interfaces" are not that obvious. How can you solve the oracle problem (e.g., beyond crashes) if you are instrumenting private APIs / low-level C++ functions. The challenge there is that you do not know the input specification for low-level C++ code, as users are usually not encouraged to directly invoke those internal APIs. 4) Faithful reproducing the invocation. In our implementation, we find it hard in terms of engineering to replay the API invocation using the recorded arguments. More intensive engineering is needed to make the replaying more robust.

@dengyinlin Please feel free to correct me if I am wrong.

dmc1778 commented 1 year ago

I really enjoyed reading FreeFuzz, it is easy to read and the idea is sound and exciting. That is why I decided to work on FreeFuzz as baseline.

Freefuzz works perfectly for Torch test cases. However, for TensorFlow private APIs, instead of hijacking private APIs, it hijacks the class where private APIs are located, as explained above.

Challenges: 1 - I examined CI tests for Torch library, Freefuzz works perfectly when I run all tests together to consider integration.

2 - This is already done but not published yet, the work is on arxiv.

3 - The distinction is in their parent module. In Tensorflow Public APIs, most critical APis are in raw_ops module while in the private section, they are in different module like array_aps, bincount_ops, or ops itself.

In the paper you mentioned that you put a lot of efforts to calculate code coverage for FreeFuzz. Is it possible to share to code with me?

dengyinlin commented 1 year ago

hi @nimashiri, thanks for your interest!

I don't know which public tests did you run to collect APIs?

I run all the public tests, i.e. all files that ends with _test.py in the TensorFlow repo(collected with find . -type f -name '*_test.py'). As @Anjiang-Wei mentioned, some of them may fail (because we may not be running them in the correct environment setting), but we are still able to collection traces from the successful ones.

My assumption is that when I run kernel tests, the code should automatically instrument public APIs inside kernel tests or op tests.

Yes, you are right, the instrumentation code needs not be changed for running any developer tests. We just run them like normal TensorFlow programs.

But, it gives me the results shown above.

I try to re-run the instrumentation again (on TensorFlow 2.4) on developer tests following the readme, but I don't get results like tf.__main__.MatrixSolveOpTest. Instead I am able to get normal entries like tf.random.normal in the database as expected. I guess there is something unclear in the instructions. For debugging, to start with, I wonder if you are able to collect traces from normal TF programs besides developer tests? E.g. if you run the following example code snippets from doc, does your instrumentation environment give the correct traces for tf.keras.layers.Conv2D?

# The inputs are 28x28 RGB images with `channels_last` and the batch
# size is 4.
input_shape = (4, 28, 28, 3)
x = tf.random.normal(input_shape)
y = tf.keras.layers.Conv2D(
2, 3, activation='relu', input_shape=input_shape[1:])(x)
print(y.shape)
dmc1778 commented 1 year ago

Hi @dengyinlin Thanks for the reply. For docs and Wild models, I get what I expect. For kernel tests, I get wrong results. Please note that I install Tensorflow from source ([https://gist.github.com/kmhofmann/e368a2ebba05f807fa1a90b3bf9a1e03](this link)) not through normal pip installation. So to run kernel tests, I collect all of them from:

/tensorflow/tensorflow/python/kernel_tests

Not from:

.local/lib/python3.8/site-packages/tensorflow/python/kernel_tests

How do you install tensorflow 2.4? I have the same version.

dengyinlin commented 1 year ago

I see, actually I did not test the instrumentation on TensorFlow from source. For instrumentation, I install tensorflow from pip, and instrument the lib/python3.7/site-packages/tensorflow/. Then, I get a separate TensorFlow repo to run developer tests like normal programs, i.e., use git clone https://github.com/tensorflow/tensorflow.git, check out to r2.4 branch, and directly run python *_test.py.

dmc1778 commented 1 year ago

Wow! very different compared to my approach, let me do it in this way and I let you know the results. I really appreciate that.

Anjiang-Wei commented 1 year ago

For code coverage, tools already exist for Pytorch: https://github.com/pytorch/pytorch/tree/14d5f139d205f924eb7ddd3e61215971bd194855/tools/code_coverage The real difficulty back then was to set up related tools in the environment in the correct way, which can be rather challenging. I am not sure how hard it is now as Pytorch may have evolved and continuously put more effort into coverage measurement.

For Tensorflow, the coverage collection was also very difficult. Back then I figured out the command line for collecting coverage, but the setup of tooling is really painful (https://github.com/tensorflow/tensorflow/issues/51091), and the technical challenge was resolved by our collaborator @YangChenyuan, whom I am sincerely grateful to. Chenyuan may be able to give you more advice on that if you meet problems in collecting coverage later. As far as I remember, there was a version incompatibility bug regarding Bazel and Gcov.

Just two sidenotes:

1) TensorFlow is now integrating the oss fuzzer, but the coverage build seems to be failing right now: https://github.com/tensorflow/tensorflow/

2) How to exercise CUDA kernels and getting coverage for code that target GPU executions are also main challenges to be addressed, which is unresolved in FreeFuzz and DeepREL.

dmc1778 commented 1 year ago

@dengyinlin @Anjiang-Wei I solved the issue. I also updated the instrumentation and now it supports all internal/private tensorflow APIs. It also covers user/public APIs. There is only one small issue which is rewriting back APIs from DB to source code for fuzzing, I am working on it. I will send a PR to update the code.