Closed shangw-nvidia closed 6 months ago
I tried to make it run once on the SparkRunner but it seems that this runner has some issues when it is run locally. From my experience the DirectRunner is fine though, even if it's clearly not memory efficient.
It would be awesome though to make it work locally on a SparkRunner ! Did you manage to make your processing work ?
We've deprecated the Beam API in datasets
. As part of it, the Beam datasets have also been converted to non-Beam-based to make using them straightforward.
Hi,
I'm wondering if https://huggingface.co/docs/datasets/beam_dataset.html has an non-GCP or non-Dataflow version example/tutorial? I tried to migrate it to run on DirectRunner and SparkRunner, however, there were way too many runtime errors that I had to fix during the process, and even so I wasn't able to get either runner correctly producing the desired output.
Thanks! Shang