Closed mikekgfb closed 4 weeks ago
thanks for reporting.
@mikekgfb tried reproducing locally. but can't so far. is it reproducible for you consistently or happened randomly?
Consistently reproducible both in ci and locally
I wonder if this is caused by the CI flow exporting to the same file name and there being some collision with multiple threads exporting to the same named .pte file. And when running the model, there was some corruption with the file causing segfault.
do you mind sharing the model artifact causing seg fault? Can help with jumpstarting the debug for this.
I wonder if this is caused by the CI flow exporting to the same file name and there being some collision with multiple threads exporting to the same named .pte file. And when running the model, there was some corruption with the file causing segfault.
do you mind sharing the model artifact causing seg fault? Can help with jumpstarting the debug for this.
I don't think we use multithreading? That being said this works now.
https://github.com/pytorch/torchchat/actions/runs/9047866134/job/24860312456?pr=751
This is a launch blocker for torchchat because it causes a fail for users following the example commands in our docs.