replicate / cog

Containers for machine learning
https://cog.run
Apache License 2.0
7.5k stars 522 forks source link

TPU support #545

Open Ontopic opened 2 years ago

Ontopic commented 2 years ago

Awesome initiative, props to the maintainers!

Was wondering if TPU support is anything on your radar or if contributions in that area are welcome?

bfirsh commented 2 years ago

Yes! Related: #376

Ontopic commented 2 years ago

Not sure whether I should reply on the open issue, or here. The dream of cog with TPU-support is large though. All the features you advertise, are clearly needed on the TPU. Perhaps you could even count on some engineers from Google to assist with proper integration?

Awesome project nonetheless, hoping it will bring the TPU into the fold!

Ontopic commented 2 years ago

I could post my findings into integrating TPU-access with Docker and recompiling libtpu for different setups? That would perhaps help someone on the Cog team to speed things along. But without someone knowledgable from your end I'm not sure I could achieve much.

bfirsh commented 2 years ago

Yes, that would be very helpful, thank you! Even if it's just links to information.

devxpy commented 1 year ago

I was trying this out, and the breaker I ran into was that cog can't run docker with --privileged --network=host (As is required by TPUs)

Would be nice to allow custom docker run params when running cog build!