I am trying to fit my own model and it seems the model selection step (see below) gets stuck at about 60% every time. When this happens, the mhcflurry-class processes are using very small amount of CPU resource and no GPU resource at all.
When training on a small dataset (10k entries), the stuck time is tolerable (20 min) and eventually it slowly moves on and finishes. But it seems to be >4 hours (still waiting as of now) on a 40k-entry dataset. Any idea what's going on?
I am trying to fit my own model and it seems the model selection step (see below) gets stuck at about 60% every time. When this happens, the
mhcflurry-class
processes are using very small amount of CPU resource and no GPU resource at all.When training on a small dataset (10k entries), the stuck time is tolerable (20 min) and eventually it slowly moves on and finishes. But it seems to be >4 hours (still waiting as of now) on a 40k-entry dataset. Any idea what's going on?
My command:
The stdout when it gets stuck