rom1504 / distributed-translator

Translate millions of captions to hundred of languages efficiently
MIT License
1 stars 0 forks source link

plan #1

Open rom1504 opened 2 years ago

rom1504 commented 2 years ago

in practice, steps:

rom1504 commented 2 years ago

since the data is small and parquet, consider just doing the pure pyspark way

rom1504 commented 2 years ago

https://towardsdatascience.com/high-performance-inferencing-with-large-transformer-models-on-spark-beb82e71ecc9

rom1504 commented 2 years ago

https://docs.databricks.com/_static/notebooks/deep-learning/pytorch-images.html