microsoft / hummingbird

Hummingbird compiles trained ML models into tensor computation for faster inference.
MIT License
3.34k stars 278 forks source link

coredumps on Ubuntu pipeline runs #760

Open ksaur opened 7 months ago

ksaur commented 7 months ago

We thought that TVM was causing Ubuntu out-of-memory errors, so we skipped all the TVM tests in Ubuntu (#709).

But now, I am seeing additional coredumps in Ubuntu runs that aren't related to TVM. Example 1 Example 2

It reaches the end of the Test with Pytest stage successfully, and then dies:

 ========== 598 passed, 66 skipped, 1410 warnings in 214.69s (0:03:34) ========== 
 /home/runner/work/_temp/c325d19a-2f4b-46db-bd6e-51d55c415279.sh: line 1:  2350 Aborted  (core dumped) pytest 
 Error: Process completed with exit code 134.

It also appears to be transient. Is this at all related to what you were seeing @mshr-h ? Maybe the problem is bigger than TVM?

mshr-h commented 7 months ago

I was seeing a similar error, like all the tests are passed but the stage fails. Can you try to skip the TVM installation in the pipeline? I'm guessing when the Python interpreter imports TVM, something happens.

ksaur commented 7 months ago

It's been a full week (with many runs) of this and no error. Maaaaaybe the memory issue was transient on the github-actions side? Let's hope 🤞...I will leave this open a month or so to see if happens again.