taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.
https://taichi-lang.org
Apache License 2.0
25.38k stars 2.27k forks source link

[AOT]"Cannot Open File" when using Vulkan AOT on macOS #4397

Open AmesingFlank opened 2 years ago

AmesingFlank commented 2 years ago

Describe the bug The AOT module save method crashes on macOS

To Reproduce Run the following code:

import taichi as ti

ti.init(ti.vulkan,print_ir = True)

f = ti.field(float,100)

@ti.kernel
def k():
    for i in range(10):
        f[i] = ti.random()

m = ti.aot.Module(ti.vulkan)
m.add_field("f",f)
m.add_kernel(k)
m.save(".","result")

Log/Screenshots

 * Taichi Core - Stack Traceback *                             
==========================================================================================
|                       Module |  Offset | Function                                      |
|----------------------------------------------------------------------------------------|
*               taichi_core.so |     126 | taichi::Logger::error(std::__1::basic_string< |
                                         | char, std::__1::char_traits<char>, std::__1:: |
                                         | allocator<char> > const&, bool)               |
*               taichi_core.so |     851 | taichi::write_data_to_file(std::__1::basic_st |
                                         | ring<char, std::__1::char_traits<char>, std:: |
                                         | __1::allocator<char> > const&, unsigned char* |
                                         | , unsigned long)                              |
*               taichi_core.so |     232 | void taichi::write_to_binary_file<taichi::lan |
                                         | g::vulkan::TaichiAotData>(taichi::lang::vulka |
                                         | n::TaichiAotData const&, std::__1::basic_stri |
                                         | ng<char, std::__1::char_traits<char>, std::__ |
                                         | 1::allocator<char> > const&)                  |
*               taichi_core.so |     487 | taichi::lang::vulkan::AotModuleBuilderImpl::d |
                                         | ump(std::__1::basic_string<char, std::__1::ch |
                                         | ar_traits<char>, std::__1::allocator<char> >  |
                                         | const&, std::__1::basic_string<char, std::__1 |
                                         | ::char_traits<char>, std::__1::allocator<char |
                                         | > > const&) const                             |
*               taichi_core.so |     207 | void pybind11::cpp_function::initialize<pybin |
                                         | d11::cpp_function::cpp_function<void, taichi: |
                                         | :lang::AotModuleBuilder, std::__1::basic_stri |
                                         | ng<char, std::__1::char_traits<char>, std::__ |
                                         | 1::allocator<char> > const&, std::__1::basic_ |
                                         | string<char, std::__1::char_traits<char>, std |
                                         | ::__1::allocator<char> > const&, pybind11::na |
                                         | me, pybind11::is_method, pybind11::sibling>(v |
                                         | oid (taichi::lang::AotModuleBuilder::*)(std:: |
                                         | __1::basic_string<char, std::__1::char_traits |
                                         | <char>, std::__1::allocator<char> > const&, s |
                                         | td::__1::basic_string<char, std::__1::char_tr |
                                         | aits<char>, std::__1::allocator<char> > const |
                                         | &) const, pybind11::name const&, pybind11::is |
                                         | _method const&, pybind11::sibling const&)::'l |
                                         | ambda'(taichi::lang::AotModuleBuilder const*, |
                                         |  std::__1::basic_string<char, std::__1::char_ |
                                         | traits<char>, std::__1::allocator<char> > con |
                                         | st&, std::__1::basic_string<char, std::__1::c |
                                         | har_traits<char>, std::__1::allocator<char> > |
                                         |  const&), void, taichi::lang::AotModuleBuilde |
                                         | r const*, std::__1::basic_string<char, std::_ |
                                         | _1::char_traits<char>, std::__1::allocator<ch |
                                         | ar> > const&, std::__1::basic_string<char, st |
                                         | d::__1::char_traits<char>, std::__1::allocato |
                                         | r<char> > const&, pybind11::name, pybind11::i |
                                         | s_method, pybind11::sibling>(void&&, taichi:: |
                                         | lang::AotModuleBuilder (*)(std::__1::basic_st |
                                         | ring<char, std::__1::char_traits<char>, std:: |
                                         | __1::allocator<char> > const&, std::__1::basi |
                                         | c_string<char, std::__1::char_traits<char>, s |
                                         | td::__1::allocator<char> > const&), pybind11: |
                                         | :name const&, pybind11::is_method const&, pyb |
                                         | ind11::sibling const&)::'lambda'(pybind11::de |
                                         | tail::function_call&)::operator()(pybind11::d |
                                         | etail::function_call&) const                  |
*               taichi_core.so |    4262 | pybind11::cpp_function::dispatcher(_object*,  |
                                         | _object*, _object*)                           |
*                       Python |     544 | (null)                                        |
*                       Python |      44 | (null)                                        |
*                       Python |     746 | (null)                                        |
*                       Python |    6421 | (null)                                        |
*                       Python |     112 | (null)                                        |
*                       Python |     753 | (null)                                        |
*                       Python |    6396 | (null)                                        |
*                       Python |    1870 | (null)                                        |
*                       Python |      51 | (null)                                        |
*                       Python |      54 | (null)                                        |
*                       Python |     163 | (null)                                        |
*                       Python |     263 | (null)                                        |
*                       Python |    5389 | (null)                                        |
*                       Python |      56 | (null)                                        |
*                         dyld |     462 | (null)                                        |
==========================================================================================

Internal error occurred. Check out this page for possible solutions:
https://docs.taichi.graphics/lang/articles/misc/install
Traceback (most recent call last):
  File "test_ir.py", line 15, in <module>
    m.save(".","result")
  File "/Users/dunfanlu/Code/Taichi/taichi/python/taichi/aot/module.py", line 223, in save
    self._aot_builder.dump(filepath, filename)
RuntimeError: [serialization.h:write_data_to_file@258] Cannot open file [./metadata.tcb] for writing. (Does the directory exist?)

Additional comments This works fine on Windows though.

bobcao3 commented 2 years ago

Maybe it's a serialization issue. Can Metal AOT save a file?

AmesingFlank commented 2 years ago

Maybe it's a serialization issue. Can Metal AOT save a file?

Yeah, metal works fine.

k-ye commented 2 years ago

Hmm, I've been thinking about this. I feel like the final file-saving should be moved out from C++ to Python. AotModuleBuilder should just return a few txt/binary files to Python.

bobcao3 commented 2 years ago

huh metal works fine, that's weird, as the runtime error comes out of something that should have nothing to do with the backend arch.

AmesingFlank commented 2 years ago
  std::FILE *f = fopen(fn.c_str(), "wb");
  if (f == nullptr) {
    TI_ERROR("Cannot open file [{}] for writing. (Does the directory exist?)",
             fn);
    assert(f != nullptr);
  }

I feel like it's probably just some weird thing about macOS file paths.

Another thing I noticed that might be related: if I use #if 1 in SPIR-V codegen, the .spv file won't actually be dumped. It doesn't throw an error, but I can't find the file anywhere.