Closed ymichels closed 1 month ago
User issues related to connecting to Hadoop and related storage providers please report to corresponding support pages. Your issue is not related to Parquet4s but to the Hadoop connector or how you use it. I do not provide support for them.
Hi, I tried using parquet4s to write a parquet file to S3. Locally it worked as expected and wrote the file to the S3. However the same code when I ran it on a docker image - produced the following error:
2024-10-13 18:59:27 Exception in thread "main" org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "file" 2024-10-13 18:59:27 at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3575) 2024-10-13 18:59:27 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3598) 2024-10-13 18:59:27 at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:171) 2024-10-13 18:59:27 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3702) 2024-10-13 18:59:27 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3653) 2024-10-13 18:59:27 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:555) 2024-10-13 18:59:27 at org.apache.hadoop.fs.FileSystem.getLocal(FileSystem.java:508) 2024-10-13 18:59:27 at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.confChanged(LocalDirAllocator.java:319) 2024-10-13 18:59:27 at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:396) 2024-10-13 18:59:27 at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:166) 2024-10-13 18:59:27 at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:147) 2024-10-13 18:59:27 at org.apache.hadoop.fs.s3a.S3AFileSystem.createTmpFileForWrite(S3AFileSystem.java:1538) 2024-10-13 18:59:27 at org.apache.hadoop.fs.s3a.S3ADataBlocks$DiskBlockFactory.create(S3ADataBlocks.java:823) 2024-10-13 18:59:27 at org.apache.hadoop.fs.s3a.S3ABlockOutputStream.createBlockIfNeeded(S3ABlockOutputStream.java:237) 2024-10-13 18:59:27 at org.apache.hadoop.fs.s3a.S3ABlockOutputStream.(S3ABlockOutputStream.java:219)
2024-10-13 18:59:27 at org.apache.hadoop.fs.s3a.S3AFileSystem.innerCreateFile(S3AFileSystem.java:2065)
2024-10-13 18:59:27 at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$create$5(S3AFileSystem.java:1960)
2024-10-13 18:59:27 at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547)
2024-10-13 18:59:27 at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528)
2024-10-13 18:59:27 at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:449)
2024-10-13 18:59:27 at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2707)
2024-10-13 18:59:27 at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2726)
2024-10-13 18:59:27 at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:1959)
2024-10-13 18:59:27 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1231)
2024-10-13 18:59:27 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1208)
2024-10-13 18:59:27 at org.apache.parquet.hadoop.util.HadoopOutputFile.create(HadoopOutputFile.java:82)
2024-10-13 18:59:27 at org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:471)
2024-10-13 18:59:27 at org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:403)
2024-10-13 18:59:27 at org.apache.parquet.hadoop.ParquetWriter.(ParquetWriter.java:395)
2024-10-13 18:59:27 at org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:918)
2024-10-13 18:59:27 at com.github.mjakubowski84.parquet4s.ParquetWriter$.internalWriter(ParquetWriter.scala:192)
2024-10-13 18:59:27 at com.github.mjakubowski84.parquet4s.ParquetWriter$BuilderImpl.build(ParquetWriter.scala:170)
2024-10-13 18:59:27 at com.github.mjakubowski84.parquet4s.ParquetWriter$BuilderImpl.build(ParquetWriter.scala:175)
2024-10-13 18:59:27 at com.github.mjakubowski84.parquet4s.ParquetWriter$BuilderImpl.writeAndClose(ParquetWriter.scala:181)
2024-10-13 18:59:27 at Main$.main(Main.scala:29)
2024-10-13 18:59:27 at Main.main(Main.scala)
No matter how much I tried to configure the options - this error persists. Please respond. If you need more information I'll try to provide it.