Open RomeoV opened 1 year ago
Here is a github gist which reproduces the error: https://gist.github.com/RomeoV/ca397a6b883c1cf567f2503d135084d8
The setup is generally inspired by the VAE tutorial in the FastAI doc.
Given dlpack
and garbage collection is involved, could very well be related to #24 (the interaction with Julia GC and Python GC). What versions of pytorch/functorch are you using? Would it be possible to check your interaction with PyNNTraining
implementation: https://github.com/lorenzoh/PyNNTraining.jl/blob/e02bf899ce7228090a60286b8373fb87bfa5b6b1/src/topytorch.jl#L34
Hello, thanks for the great work! I'm currently trying to wrap a pytorch model into a Flux based training setup. The training seems to go fine for a few epochs, however seemingly at random, a segmentation fault occurs (see below). I don't have a great MWE right now (I'll try to make one still), but perhaps we can already make some conclusions based on the stacktrace, which here happened after about seven epochs:
Here are the referenced code snippets in the stacktrace: https://github.com/rejuvyesh/PyCallChainRules.jl/blob/1723781d955c2f0df479df1e2f9e983a377865fb/src/pytorch.jl#L56-L64 and https://github.com/pabloferz/DLPack.jl/blob/61f48ee6b5e4f56d9b8525fa6ef9b613242160b8/src/pycall.jl#L98-L116