Closed gilfree closed 1 day ago
can you give an example with code?
Ok, its a bit more involved - the method you are patching is also used by torch.export, and there the name is nn_module_stack based.
I am exporting and compiling the model, and the export was also under depyf prepare_debug context, something like below, which is probably not something you have intended. So, if you prefer to close this as not supported - I'm fine with it, but it would be nice if it would work, as it will allow also export debugging.
import torch
import depyf
class MyModel(torch.nn.Module):
def __init__(self):
super(MyModel, self).__init__()
self.encoder = torch.nn.TransformerEncoder(
torch.nn.TransformerEncoderLayer(d_model=8, nhead=2, batch_first=True),
num_layers=6,
)
def forward(self, x):
return self.encoder(x)
class WrappedModel(torch.nn.Module):
def __init__(self):
super(WrappedModel, self).__init__()
self.model = MyModel()
def forward(self, x):
return self.model(x)
class WrappedModel2(torch.nn.Module):
def __init__(self):
super(WrappedModel2, self).__init__()
self.model = WrappedModel()
def forward(self, x):
return self.model(x)
model = WrappedModel2()
x = torch.randn(1, 10, 8)
with depyf.prepare_debug('depyf'):
model2 = torch.compile(model,fullgraph=True)
model2(x)
exported = torch.export.export(model,(x,))
model=exported.module()
can you please open a pr to address it?
If the name is too long, I don't know how to truncate it properly. e.g. do you want to keep the suffix of the file? which part to truncate?
Your current environment
Collecting environment information... PyTorch version: 2.5.1+cu124 Is debug build: False CUDA used to build PyTorch: 12.4 ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.5 LTS Clang version: Could not collect
Python version: 3.10.6 Python platform: Linux Is CUDA available: True CUDA runtime version: 12.4.99 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA A100-PCIE-40GB GPU 1: NVIDIA A100-PCIE-40GB
Nvidia driver version: 550.120 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True
CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit ....
Versions of relevant libraries: [pip3] mypy==1.13.0 [pip3] mypy-extensions==1.0.0 [pip3] numpy==2.1.3 [pip3] pytorch-lightning==2.4.0 [pip3] pytorch-triton==3.1.0+cf34004b8a [pip3] torch==2.5.1 [pip3] torchaudio==2.5.0.dev20241105+cu121 [pip3] torchmetrics==1.5.1 [pip3] torchvision==0.20.1 [pip3] triton==3.1.0 [conda] No relevant packages
🐛 Describe the bug
Seems you are patching the
lazy_format_graph_code
method and use the name passed to it to compose a file name. as the name is based on the nn_module_stack of models, in some cases it can lead to very long file names, which causes OSError exception. This function generates the name:first_call_function_nn_module_stack
This is the problematic line:
https://github.com/thuml/depyf/blob/ee7d231482ff877aa33b02ca2ae7390365572072/depyf/explain/patched_lazy_format_graph_code.py#L39
Probably truncating the filename to 255 will do, or using
os.pathconf(filepath, 'PC_NAME_MAX')
to set the limit