Unable to compile for targets other than x86_64

ghost commented 5 years ago

I have tried emitting bundles for targets other than x86_64 to no avail. Even i386 is failing. It has something to do with tensor types. I suppose different targets expect/support different tensor types; however, I made sure that my network is using Float32. So this shouldn’t be a problem for i386. The network looks like this:

graph(%input_data : Float(1, 2)
      %1 : Float(8, 2)
      %2 : Float(8)
      %3 : Float(8, 8)
      %4 : Float(8)
      %5 : Float(8, 8)
      %6 : Float(8)
      %7 : Float(1, 8)
      %8 : Float(1)) {
  %9 : Float(1, 8) = onnx::Gemm[alpha=1, beta=1, transB=1](%input_data, %1, %2), scope: FeedForwardNN/Linear
  %10 : Float(1, 8) = onnx::Relu(%9), scope: FeedForwardNN/ReLU
  %11 : Float(1, 8) = onnx::Gemm[alpha=1, beta=1, transB=1](%10, %3, %4), scope: FeedForwardNN/ReLU
  %12 : Float(1, 8) = onnx::Relu(%11), scope: FeedForwardNN/ReLU
  %13 : Float(1, 8) = onnx::Gemm[alpha=1, beta=1, transB=1](%12, %5, %6), scope: FeedForwardNN/ReLU
  %14 : Float(1, 8) = onnx::Relu(%13), scope: FeedForwardNN/ReLU
  %15 : Float(1, 1) = onnx::Gemm[alpha=1, beta=1, transB=1](%14, %7, %8), scope: FeedForwardNN/ReLU
  %output : Float(1, 1) = onnx::Sigmoid(%15), scope: FeedForwardNN/Sigmoid
  return (%output);
}

The traceback that I get when trying to build is:

(base) wny@wny-Macbook ~/sandbox/nn/xgmr/build/debug/bin $ ./x-model-runner -model=/Users/wny/sandbox/nn/pytorch_tuts/ffnn.onnx -network-name=x -model-input-name=input_data -cpu -emit-bundle=../output -target i386
Assertion failed: ((i >= FTy->getNumParams() || FTy->getParamType(i) == args[i]->getType()) && "Calling a function with a bad signature: argument type mismatch."), function createCall, file ../../lib/LLVMIRCodeGen/LLVMIRGen.cpp, line 558.
0  x-model-runner       0x000000010721e0bc llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 37
1  x-model-runner       0x000000010721e4be SignalHandler(int) + 192
2  libsystem_platform.dylib 0x00007fff713b7b3d _sigtramp + 29
3  libsystem_platform.dylib 0x000000011a0f8938 _sigtramp + 2832469528
4  libsystem_c.dylib        0x00007fff712751c9 abort + 127
5  libsystem_c.dylib        0x00007fff7123d868 basename_r + 0
6  x-model-runner       0x00000001072c7136 glow::LLVMIRGen::createCall(llvm::IRBuilder<llvm::ConstantFolder, llvm::IRBuilderDefaultInserter>&, llvm::Function*, llvm::ArrayRef<llvm::Value*>) + 422
7  x-model-runner       0x00000001072d04e3 glow::LLVMIRGen::generateLLVMIRForInstr(llvm::IRBuilder<llvm::ConstantFolder, llvm::IRBuilderDefaultInserter>&, glow::Instruction const*) + 1603
8  x-model-runner       0x0000000107272111 glow::CPULLVMIRGen::generateLLVMIRForInstr(llvm::IRBuilder<llvm::ConstantFolder, llvm::IRBuilderDefaultInserter>&, glow::Instruction const*) + 2113
9  x-model-runner       0x00000001072c92ed glow::LLVMIRGen::generateLLVMIRForModule(llvm::IRBuilder<llvm::ConstantFolder, llvm::IRBuilderDefaultInserter>&) + 445
10 x-model-runner       0x00000001072718c8 glow::CPULLVMIRGen::generateLLVMIRForModule(llvm::IRBuilder<llvm::ConstantFolder, llvm::IRBuilderDefaultInserter>&) + 40
11 x-model-runner       0x00000001072c448b glow::LLVMIRGen::performCodeGen() + 139
12 x-model-runner       0x000000010727c2c2 glow::BundleSaver::save(llvm::StringRef, llvm::StringRef, llvm::StringRef, llvm::SmallVectorImpl<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > const&, llvm::StringRef, llvm::StringRef) 
+ 450
13 x-model-runner       0x00000001072eccd0 glow::LLVMBackend::save(glow::Function*, llvm::StringRef, llvm::StringRef) const + 832
14 x-model-runner       0x00000001071124a4 glow::ExecutionEngine::save(glow::Function*, glow::CompilationOptions const&, llvm::StringRef, llvm::StringRef) + 148
15 x-model-runner       0x0000000106f27e6f glow::Loader::compile(glow::PlaceholderBindings&) + 2415
16 x-model-runner       0x0000000106f4cf85 buildNetwork(glow::Loader&, glow::Type const*) + 117
17 x-model-runner       0x0000000106f4d4a2 main + 242
18 libdyld.dylib            0x00007fff711cced9 start + 1
Abort trap: 6

In case there are any questions about x-model-runner: it is a custom runner that I built based on the image-classifier. The runner itself is working as expected for x86_64 -- it emits bundles that I'm able to link to and run inference.

opti-mix commented 5 years ago

@wyessen Could you provide C2 models for your test-case, so that we can reproduce it locally?

Also, could you check under debugger, which function and argument are failing in the assert?

ghost commented 5 years ago

@opti-mix You should have that info in the back trace. The highest level function triggering this is glow::Loader::compile. The call propagates to glow::LLVMIRGen::createCall, and that's where the signal handler gets invoked. The assertion is failing in LLVMIRCodeGen/LLVMIRGen.cpp, line 558.

I'll provide C2 protos a bit later. Do you need the source code for x-model-runner too?

opti-mix commented 5 years ago

OK. I think I figured out what is the issue. It is related to size_t.

libjit has a lot of kernels whose parameters have a type size_t. But sizeof(size_t) == 8 on 64bit systems and sizeof(size_t) == 4 on i386.

So, libjit is currently compiled for x86_64, but it's client, i.e. the bundle, is compiled according to the -target command-line parameter, i.e. for i386. As a result, the client thinks that size_t is 32 bits and libjit thinks it is 64 bits. Thus the mismatch.

opti-mix commented 5 years ago

Also, could you check under debugger, which function and argument are failing in the assert?

I meant what is the callee argument of createCall, i.e. the function being called from the bundle. But it doesn't matter. I could reproduce locally with a different network as the issue has nothing to do with the specific network as I explained above.

ghost commented 5 years ago

Right. I expected that also. By the way, this is giving off a strong design smell.

opti-mix commented 5 years ago

Let's consider possible solutions. All of them are relatively easy to implement.

One could use fixed-width types inside libjit instead of size_t. But this could be not so optimal in terms of performance.
One could support having multiple bitcode files for libjit in the same backend, one per possible platform size, e.g. one for x86_64 and one for i386
One could dynamically, at the codegen time check the type of the kernel being called and provide the arguments using compliant types.

bertmaher commented 5 years ago

By the way, this is giving off a strong design smell.

This is not a super constructive comment.

ghost commented 5 years ago

@opti-mix In order to comment on the proposed solutions, I need to better understand where and how libjit comes in. I haven't dug that deep into Glow. So that would be a question for the contributors to answer, but thanks for looking into this.

I suppose one workaround for now would be to build Glow for i386. Could you confirm? In that case, what is the easiest way to force this (i.e. cmake flags?).

opti-mix commented 5 years ago

In order to comment on the proposed solutions, I need to better understand where and how libjit comes in.

libjit is a library of kernels. See here: https://github.com/pytorch/glow/tree/master/lib/Backends/CPU/libjit. When Glow generates a bundle, it creates an LLVM function that calls those kernels as required by your C2 model.

I suppose one workaround for now would be to build Glow for i386. Could you confirm? In that case, what is the easiest way to force this (i.e. cmake flags?).

Yes. One could basically build libjit or the whole Glow for i386 (but then it won't work with x86_64 using this build). One would need to adjust CMake flags. Most likely one could modify this place: https://github.com/pytorch/glow/blob/master/lib/Backends/CPU/CMakeLists.txt#L16 to just compile libjit for i386.

ghost commented 5 years ago

@opti-mix Understood, thanks. The name was throwing me off, and I was reluctant to ask whether this was referring to the GNU's LibJit (which would have been strange).

Slightly off topic: Glow paper talks about reduction from high level IR to low level IR, one of the advantages being that backend designers should only have to implement the low level linear algebra routines. Is libjit the place where those low level linear algebra routines are implemented? In particular I'm looking at libjit_matmul.cpp.

If that is indeed the case, then it would make sense to go with your second suggested solution: keep different versions of libjit on per-target basis. We'd need some kind of a mechanism for selecting the correct version based on -target.

opti-mix commented 5 years ago

Understood, thanks. The name was throwing me off, and I was reluctant to ask whether this was referring to the GNU's LibJit (which would have been strange).

Yeah, the name is a misnomer ;-) It is called like this for historical reasons, because it is used from the CPU/JIT backend. It should be probably renamed into something more reasonable like libkernels or something like that.

Slightly off topic: Glow paper talks about reduction from high level IR to low level IR, one of the advantages being that backend designers should only have to implement the low level linear algebra routines. Is libjit the place where those low level linear algebra routines are implemented? In particular I'm looking at libjit_matmul.cpp.

Yes, libjit is the place where kernels are implemented. And the low-level IR is defined by IR.h. You have IRFunctions, which consist of Instructions. And when the CPU backend lowers this IR for a specific target it converts instructions into calls of the kernels defined in libjit.

If that is indeed the case, then it would make sense to go with your second suggested solution: keep different versions of libjit on per-target basis. We'd need some kind of a mechanism for selecting the correct version based on -target.

The problem with this is that in theory Glow allows you to compile for any target supported by LLVM. Which exact targets are supported by your installation of LLVM is not known in advance. But libjit is built when you build Glow, not when you compile your model. So, it is unclear how we can build all different target-specific versions of libjit at Glow's build-time if we don't know which targets we need to support. Of course, we could just ask to provide a list of targets as a CMake argument and build only for targets from this list.

ghost commented 5 years ago

The problem with this is that in theory Glow allows you to compile for any target supported by LLVM. Which exact targets are supported by your installation of LLVM is not known in advance. But libjit is built when you build Glow, not when you compile your model. So, it is unclear how we can build all different target-specific versions of libjit at Glow's build-time if we don't know which targets we need to support. Of course, we could just ask to provide a list of targets as a CMake argument and build only for targets from this list.

Point taken. The reason I suggested the second solution is because Glow already supports custom backends (for which clients are responsible), and in this case we have a backend but a potentially custom architecture. Should Glow make clients responsible for the backend architecture also? Glow could provide a few implementations out of the box for different architectures, and ask the client to link with the correct library.

In this case, your suggestion to ask for a list of targets sounds like a good one to me. A use case that I have in mind is something like this: an organization deploys neural networks on particular hardware. They build Glow once, set up their build system, and build for their targets going forward (not requiring Glow rebuild).

Basically, backends are components that are not viewed as part of Glow proper (and they shouldn't be... at least that is my understanding of Glow's design). Similarly, backend kernels that target different hardware architectures should be treated as interchangeable modules.

ghost commented 5 years ago

Aha! So on another note: the libjit_ symbols are not exposed, say through .h, anywhere. I wonder if it may be a good idea to expose them? For instance, what if I want to write my own implementation for some of those functions, for whatever reason? Or, like in this case, I want to provide my own implementation of the entire libjit (not quite, but... close enough to an actual use case).

On a somewhat related note, I wonder if we'd be able to target devices without an FPU? How does Glow quantization work in this case?

If something like that is not supported out of the box, then that is a candidate use case where one may need to heavily modify libjit (and possibly other parts).

Also, looking at CMakeLists.txt in the CPU Backend, it isn't clear to me whether we'd be able to get away with just building the 32 bit version of libjit (or the entire CPU backend) while having the rest of Glow built in 64bit.

So maybe the solution of having a custom libjit per architecture might be too much work for what it's worth. On the other hand, it sounds like a necessity (given the possible use cases outlined above).

ghost commented 5 years ago

Sorry, closed accidentally. Reopened.

opti-mix commented 5 years ago

On a somewhat related note, I wonder if we'd be able to target devices without an FPU?

If target devices have soft-FP (i.e. emulated in software), Glow works just fine. If target devices do not support FP in any form, Glow can work with models that are completely quantized.

How does Glow quantization work in this case?

Quantization happens when Glow compiles network using a quantization profile (e.g. when using the-load-profile option) and produces a bundle, not when the bundle runs. All uses of FP-kernels are replaced by quantized kernels, all weights are quantized, etc. So, no FP is required by the resulting bundle (may be it still needs FP to quantize the input at the very beginning and the output at the very end, but it is relatively easy to get around that by doing it e.g. on the host-side).

Also, looking at CMakeLists.txt in the CPU Backend, it isn't clear to me whether we'd be able to get away with just building the 32 bit version of libjit (or the entire CPU backend) while having the rest of Glow built in 64bit.

I think it should work. libjit is used for something that will run on the device. Glow itself usually runs on the host (though it could run on the device if there is a wish to do so) and it shouldn't matter if Glow itself is 64bit or not.

Regarding providing your own kernels: It is totally possible. The usual way to do this would be to create a new backend which is derived from the class LLVMBackend, just like CPUBackend. This new backend can add/remove/modify any libjit kernels and change some/all parts related to the LLVM IR code generation. Basically, each backend derived from LLVMBackend can provide its own version of libjit.

opti-mix commented 5 years ago

Also, looking at CMakeLists.txt in the CPU Backend, it isn't clear to me whether we'd be able to get away with just building the 32 bit version of libjit (or the entire CPU backend) while having the rest of Glow built in 64bit.

I just checked. Adding -target i386 to the mentioned CMake file solves the issue for i386. It required some minor tweaks for include files used by libjit as I don't have i386 cross-compiler include files on my system. With this change I was able to produce an i386 bundle without any issues.

ghost commented 5 years ago

@opti-mix Great, thank you! Lots of useful info. Could you please provide the details of your CMake modification so that I can try to reproduce your success exactly?

opti-mix commented 5 years ago

Great, thank you! Lots of useful info. Could you please provide the details of your CMake modification so that I can try to reproduce your success exactly?

For CMake I just added this line -target i386 to options:

set(CPURunttimeCompilationOptions
      -std=c++14
      -ffast-math
      -fno-finite-math-only
      -g0
      -target i386
      -emit-llvm
      -O0)

try to compile and there will be a couple of complaints about missing include files or missing definitions of some functions, but this is trivial to fix. You can comment out some of the places where they are used as they are not really important. You can also replace some C++ includes by POSIX C-includes. After that everything should compile.

ghost commented 5 years ago

Yeah, I can confirm that it worked correctly. I think we can close the issue with the following comments:

The client must make sure that the backend kernels that they are using are compatible with the architecture they are compiling for. For instance, for the CPU, this entails building libjit for the specific architecture.
The client may then use the -target option (and the -mcpu option with the CPU backend) with the client's model runner to emit bundles for the specific target/architecture.

Could you confirm that the -mcpu is precisely the same as it is in LLVM proper (that is, it isn't an alias for something Glow-specific)?

opti-mix commented 5 years ago

Could you confirm that the -mcpu is precisely the same as it is in LLVM proper (that is, it isn't an alias for something Glow-specific)?

Yes. You can see in the sources of LLVMBackend.cpp in Glow that -mcpu is just passed down to LLVM's initTargetMachine method. Glow does not do anything fancy with this option.

opti-mix commented 5 years ago

Yeah, I can confirm that it worked correctly.

Awesome!

I think we can close the issue with the following comments: The client must make sure that the backend kernels that they are using are compatible with the architecture they are compiling for. For instance, for the CPU, this entails building libjit for the specific architecture.

The client may then use the -target option (and the -mcpu option with the CPU backend) with the client's model runner to emit bundles for the specific target/architecture.

Any improvements of Glow docs are very welcome! Would you be willing to submit a small PR to update the AOT.md with this cross-compilation related information?

ghost commented 5 years ago

Thank you! Well, as a small service fee, I can definitely help improve the docs. Give me a few days, though. I should have some time this weekend.

Let's keep this issue open until the PR is in, just in case.

djordje-dj-s commented 5 years ago

For armv7l-unknown-linux-gnueabihf target, I also (on top of suggested change for CPURunttimeCompilationOptions) had to modify hardcoded sizes of 8 (to 4) in void BundleSaver::emitSymbolTable() and void BundleSaver::emitBundleConfig() - compilation would go fine without that change, but execution would fail.

Regarding modification of missing include files etc.., following change looks less intrusive (same Backends/CPU/CMakeLists.txt): ... add_custom_command( OUTPUT ${libjit_obj_file}

COMMAND ${CLANG_BIN} -c ${libjit_src_file_path} ${CPURunttimeCompilationOptions} -o ${libjit_obj_file}
COMMAND ${CLANG_BIN} -stdlib=libc++ -c ${libjit_src_file_path} ${CPURunttimeCompilationOptions} -o ${libjit_obj_file} DEPENDS ${libjit_src_file_path} WORKING_DIRECTORY ${CMAKE_CURRENT_LIST_DIR})

ghost commented 5 years ago

@djordje-dj-s I cannot confirm your CMake fix. What system are you on? Clang should select the correct stdlib automatically; most of the time there is no need for an explicit -stdlib= flag. At any rate, this didn't work for me with -target i386.

Regarding fixed sizes: yeah, that makes sense. In fact, I think a good fix would be to replace the fixed width types with std::size_t. In that case, in the emitted bundle, the structs (BundleConfig and SymbolTableEntry) will contain the correctly-sized members. @opti-mix can you confirm?

ghost commented 5 years ago

@opti-mix By the way, what is the reason for such large alignment requirement (64 bytes, hard-coded). Was this meant to be 8 bytes (64 bits)? That is, is this meant to be the strictest scalar type alignment (sizeof(long double) for instance, where supported)? If not, what affects calculation of the alignment member in BundleConfig?

I don't see any logic for its calculation in the code. This is what lives in Memory.h:

/// The tensor payload is allocated to be aligned to this value.
constexpr unsigned TensorAlignment = 64;

And then it is used in building the BundleConfig structure in BundleServer::emitBundleConfig:

llvm::ConstantInt::get(uint64TType, TensorAlignment)

djordje-dj-s commented 5 years ago

@wyessen , I am on Ubuntu 18.04LTS, building for ARMv7. Also building with LLVM dependency (in custom folder).

ghost commented 5 years ago

@opti-mix Looks like these should depend on target, cpu, arch, and targetFeatures:

  auto *bundleConfigTy =
      llvm::StructType::get(irgen_->getLLVMContext(),
                            {uint64TType, uint64TType, uint64TType, uint64TType,
                             uint64TType, symbolTableEntryTy->getPointerTo()});

and

  auto symbolTableEntryTy = llvm::StructType::get(
      irgen_->getLLVMContext(),
      {charTy->getPointerTo(), uint64TTy, uint64TTy, charTy});

ghost commented 5 years ago

@djordje-dj-s

@wyessen , I am on Ubuntu 18.04LTS, building for ARMv7. Also building with LLVM dependency (in custom folder).

OK, I need to jump on my ubuntu VM to confirm. But this didn't work for me on OS X, so even if it works for you on Ubuntu, looks like it isn't a universal solution.

djordje-dj-s commented 5 years ago

Another hard coded value (8->4) is definitely not a good solution - this was done for investigation purposes only. Target sizeof() is one step closer, but as you indicated, more details need to be taken into account.

As one more data point, in order to use custom llvm/clang on my Ubuntu (e.g. I have llvm-6 installed on my Ubuntu, system-wide, but for glow using llvm-8), I had to make one more temp change (regarding same CMakeLists.txt): ... if("${CMAKE_CXX_COMPILER_ID}" MATCHES "Clang") set(CLANG_BIN ${CMAKE_CXX_COMPILER}) else() find_program(CLANG_BIN clang++) +set(CLANG_BIN "/home/djordje/GLOW/llvm_install/bin/clang++") endif()

opti-mix commented 5 years ago

@wyessen

By the way, what is the reason for such large alignment requirement (64 bytes, hard-coded). Was this meant to be 8 bytes (64 bits)?

SIMD processing of tensors using e.g. AVX, SSE, etc. You want to process e.g. multiple FP or int8 values at once. Some of those instructions have certain requirements on the alignment of data to get the best performance.

opti-mix commented 5 years ago

Regarding fixed sizes: yeah, that makes sense. In fact, I think a good fix would be to replace the fixed width types with std::size_t. In that case, in the emitted bundle, the structs (BundleConfig and SymbolTableEntry) will contain the correctly-sized members.

BundleConfig and SymbolTableEntry used to have size_t members. We decided to use fixed width ints later, so that you can read these tables in the same way on all platforms.

opti-mix commented 5 years ago

@wyessen Have you seen #2847? This make your use-case work without any CMake hacks.

ghost commented 5 years ago

@opti-mix Yep, that should do it. But I think the issues in BundleServer.cpp still need to be addressed in a similar way.

ghost commented 5 years ago

@opti-mix Oh, and by the way, this doesn't completely resolve things. libjit still needs to be compiled for the target in the specific way that bundles are compiled. So the -target option is still required for libjit (along with any other options like mcpu, mfloat-abi, and so on. The only way to avoid this is to not compile libjit into a library, but instead link with libjit's sources when compiling bundles.

Basically, we cannot cross-link objects compiled for different architectures, even if the sizes of scalar types match.

opti-mix commented 5 years ago

@wyessen

Yep, that should do it. But I think the issues in BundleServer.cpp still need to be addressed in a similar way.

I'm not sure I understand the issue with BundleSever.cpp. What is the exact problem?

ghost commented 5 years ago

@opti-mix

@wyessen

Yep, that should do it. But I think the issues in BundleServer.cpp still need to be addressed in a similar way.

I'm not sure I understand the issue with BundleSever.cpp. What is the exact problem?

I mean things like uint64TType in BundleServer::. Are they guaranteed to collapse to supported types when compiling for a particular architecture? That's the problem that @djordje-dj-s encountered (I haven't had the chance to try it out yet myself).

opti-mix commented 5 years ago

I mean things like uint64TType in BundleServer::. Are they guaranteed to collapse to supported types when compiling for a particular architecture? That's the problem that @djordje-dj-s encountered (I haven't had the chance to try it out yet myself).

Yes, uint64TType is defined as a fixed width type:

auto *uint64TType = irgen_->getBuilder().getIntNTy(sizeof(uint64_t) * 8);

It will have the same size on any platform.

ghost commented 5 years ago

@opti-mix Exactly. So that creates an integer 64 bits wide. Now in my bundle .o file, I have BundleConfig containing 64bit integer fields. Is that correct? If so, how would this be supported on a 32bit platform?

In particular, when I do inference, I need something like this:

struct BundleConfig
{
    uint64_t constantWeightVarsMemSize{};
    uint64_t mutableWeightVarsMemSize{};
    uint64_t activationsMemSize{};
    uint64_t alignment{};
    uint64_t numSymbols{};
    const SymbolTableEntry *symbolTable{};
};

extern "C" BundleConfig my_bundle_config

So on a 64bit platform, that is not a problem - I can deserialize my_bundle_config just fine by interpreting it as BundleConfig (as defined above).

Now, what if I'm on a 32bit platform?

opti-mix commented 5 years ago

Now, what if I'm on a 32bit platform?

My understanding is that on a 32bit platform uint64_t is defined anyways (probably as alias for long long unsigned int or something like that). So, there should be no problem for a compiler to work with uint64_t. Or do you tell me uint64_t is not supported on 32bit platforms?

ghost commented 5 years ago

No, forget what I said. I'm overthinking things now. We should be OK.

djordje-dj-s commented 5 years ago

It seems that the root cause is that I have not kept updated structures in my inference code based on glow/examples/bundles/lenet_mnist.cpp (and resnet50.cpp). Earlier, I worked around the problem (size_t on host not being equalt to size_t on target) with changes in BundleServer.cpp. But new definition of SymbolTableEntry and BundleConfig structures includes members that are not size_t anymore but int64_t. This is what I have missed to update on my end (on inference side).

ghost commented 5 years ago

It seems that the root cause is that I have not kept updated structures in my inference code based on glow/examples/bundles/lenet_mnist.cpp (and resnet50.cpp). Earlier, I worked around the problem (size_t on host not being equalt to size_t on target) with changes in BundleServer.cpp. But new definition of SymbolTableEntry and BundleConfig structures includes members that are not size_t anymore but int64_t. This is what I have missed to update on my end (on inference side).

Yep. @opti-mix is right, there is no issue now. I wasn't able to reproduce your issue either.

djordje-dj-s commented 5 years ago

Just checked tip of master (including #2847 merge), including execution on ARMv7 platform, and both AOT examples work OK.

ghost commented 5 years ago

@djordje-dj-s Are you compiling on the target platform, or are you cross-compiling for the target platform?

djordje-dj-s commented 5 years ago

@wyessen , I was compiling inference example, and linking with network object code, on the target platform.

ghost commented 5 years ago

@djordje-dj-s Ok, then this is different from the original problem discussed here. I've been trying to cross compile for a target program.

djordje-dj-s commented 5 years ago

Just to be completely clear: Bundle (e.g. resnet50.o, resnet50.weights) was created on the host platform (Ubuntu 18.04, x86_64): ../bin/image-classifier ../tests/images/imagenet/zebra_340.png -model ../resnet50/model.onnx -use-imagenet-normalization -image-mode=0to1 -relocation-model=pic -model-input-name="gpu_0/data_0" -cpu -target armv7l-unknown-linux-gnueabihf -emit-bundle ./bundle_dir/ -network-name=resnet50

Object file (e.g. resnet50.o ) was linked on the target, with the inference application compiled also on the target (armv7): g++ main.cpp resnet50.o -lm -lpng -fpack-struct=8 -fpermissive -o infer_resnet50

ghost commented 5 years ago

@djordje-dj-s Ok, thanks. Yeah, I am trying to compile the inference app on host for the target, and am doing linking on host as well.

ghost commented 5 years ago

I think this ticket can now be closed. @opti-mix ?

opti-mix commented 5 years ago

@wyessen I'm OK closing it. But you wanted to add something to the docs to clarify some aspects of building/using Glow for cross-compilation scenarios. Would you still be willing to improve docs about it?

pytorch / glow

Unable to compile for targets other than x86_64 #2836