juicedata / juicefs

JuiceFS is a distributed POSIX file system built on top of Redis and S3.
https://juicefs.com
Apache License 2.0
10.81k stars 951 forks source link

the bug to write data into juicefs with spark-sql #5088

Closed GoodJeek closed 1 week ago

GoodJeek commented 2 months ago

What happened: when insert a few amounts of datas overwrite into a table with spark-sql,it normally worked well, but when the data size exceed a specific value such as 5000, ti could not write data into the table completely and some error logs were printed

the logs as below:

spark driver log:

24/08/15 03:03:53 INFO ShuffleWriteClientImpl: Successfully send heartbeat to Coordinator grpc client ref to 10.39.215.217:19999 24/08/15 03:03:53 INFO ShuffleWriteClientImpl: Successfully send heartbeat to Coordinator grpc client ref to 10.39.215.218:19999 24/08/15 03:03:53 INFO RssShuffleManager: Finish send heartbeat to coordinator and servers 24/08/15 03:03:56 WARN TaskSetManager: Lost task 0.0 in stage 14.0 (TID 131) (10.42.0.245 executor 3): org.apache.spark.SparkException: Task failed while writing rows. at org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:500) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:321) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$16(FileFormatWriter.scala:229) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:131) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: jfs://hive/warehouse/item_01/.hive-staging_hive_2024-08-15_02-58-50_459_6630558984109284475-2/-ext-10000/_temporary/0/_temporary/attempt_202408150258544842979019484022797_0014_m_000000_131/part-00000-a208cb54-a78d-43ef-81d7-abc0e871fcb3-c000 at io.juicefs.JuiceFileSystemImpl.error(JuiceFileSystemImpl.java:281) at io.juicefs.JuiceFileSystemImpl.access$600(JuiceFileSystemImpl.java:76) at io.juicefs.JuiceFileSystemImpl$FSOutputStream.close(JuiceFileSystemImpl.java:1018) at java.io.FilterOutputStream.close(FilterOutputStream.java:159) at io.juicefs.JuiceFileSystemImpl$BufferedFSOutputStream.close(JuiceFileSystemImpl.java:1139) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:77) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106) at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat$1.close(HiveIgnoreKeyTextOutputFormat.java:99) at org.apache.spark.sql.hive.execution.HiveOutputWriter.close(HiveFileFormat.scala:162) at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.releaseCurrentWriter(FileFormatDataWriter.scala:64) at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.releaseResources(FileFormatDataWriter.scala:75) at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:105) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:305) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1525) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:311) ... 9 more 24/08/15 03:03:56 INFO TaskSetManager: Starting task 0.1 in stage 14.0 (TID 132) (10.42.3.149, executor 2, partition 0, ANY, 4472 bytes) taskResourceAssignments Map() 24/08/15 03:03:56 INFO BlockManagerInfo: Added broadcast_17_piece0 in memory on 10.42.3.149:38447 (size: 133.4 KiB, free: 3.3 GiB) 24/08/15 03:03:56 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 2 to 10.42.3.149:44030

spark executor log as below:

caused by: expected element type but have (after 11 tries) [writer.go:118] 24/08/15 03:20:27 WARN JuiceFileSystemImpl: 2024/08/15 03:20:27.101260 juicefs[14] : Upload chunks/4/4247/4247266_0_2616981: SerializationError: failed to unmarshal error message status code: 417, request id: , host id: caused by: UnmarshalError: failed to unmarshal error message 00000000 3c 21 44 4f 43 54 59 50 45 20 68 74 6d 6c 20 50 |<!DOCTYPE html P| 00000010 55 42 4c 49 43 20 22 2d 2f 2f 57 33 43 2f 2f 44 |UBLIC "-//W3C//D| 00000020 54 44 20 48 54 4d 4c 20 34 2e 30 31 2f 2f 45 4e |TD HTML 4.01//EN| 00000030 22 20 22 68 74 74 70 3a 2f 2f 77 77 77 2e 77 33 |" "http://www.w3| 00000040 2e 6f 72 67 2f 54 52 2f 68 74 6d 6c 34 2f 73 74 |.org/TR/html4/st| 00000050 72 69 63 74 2e 64 74 64 22 3e 0a 3c 68 74 6d 6c |rict.dtd">.<html| 00000060 3e 3c 68 65 61 64 3e 0a 3c 6d 65 74 61 20 68 74 |>.<meta ht| 00000070 74 70 2d 65 71 75 69 76 3d 22 43 6f 6e 74 65 6e |tp-equiv="Conten| 00000080 74 2d 54 79 70 65 22 20 63 6f 6e 74 65 6e 74 3d |t-Type" content=| 00000090 22 74 65 78 74 2f 68 74 6d 6c 3b 20 63 68 61 72 |"text/html; char| 000000a0 73 65 74 3d 75 74 66 2d 38 22 3e 0a 3c 74 69 74 |set=utf-8">.<tit| 000000b0 6c 65 3e 45 52 52 4f 52 3a 20 54 68 65 20 72 65 |le>ERROR: The re| 000000c0 71 75 65 73 74 65 64 20 55 52 4c 20 63 6f 75 6c |quested URL coul| 000000d0 64 20 6e 6f 74 20 62 65 20 72 65 74 72 69 65 76 |d not be retriev| 000000e0 65 64 3c 2f 74 69 74 6c 65 3e 0a 3c 73 74 79 6c |ed.<styl| 000000f0 65 20 74 79 70 65 3d 22 74 65 78 74 2f 63 73 73 |e type="text/css| 00000100 22 3e 3c 21 2d 2d 20 0a 20 2f 2a 0a 20 53 74 79 |"><!-- . /. Sty| 00000110 6c 65 73 68 65 65 74 20 66 6f 72 20 53 71 75 69 |lesheet for Squi| 00000120 64 20 45 72 72 6f 72 20 70 61 67 65 73 0a 20 41 |d Error pages. A| 00000130 64 61 70 74 65 64 20 66 72 6f 6d 20 64 65 73 69 |dapted from desi| 00000140 67 6e 20 62 79 20 46 72 65 65 20 43 53 53 20 54 |gn by Free CSS T| 00000150 65 6d 70 6c 61 74 65 73 0a 20 68 74 74 70 3a 2f |emplates. http:/| 00000160 2f 77 77 77 2e 66 72 65 65 63 73 73 74 65 6d 70 |/www.freecsstemp| 00000170 6c 61 74 65 73 2e 6f 72 67 0a 20 52 65 6c 65 61 |lates.org. Relea| 00000180 73 65 64 20 66 6f 72 20 66 72 65 65 20 75 6e 64 |sed for free und| 00000190 65 72 20 61 20 43 72 65 61 74 69 76 65 20 43 6f |er a Creative Co| 000001a0 6d 6d 6f 6e 73 20 41 74 74 72 69 62 75 74 69 6f |mmons Attributio| 000001b0 6e 20 32 2e 35 20 4c 69 63 65 6e 73 65 0a 2a 2f |n 2.5 License./| 000001c0 0a 0a 2f 2a 20 50 61 67 65 20 62 61 73 69 63 73 |../ Page basics| 000001d0 20 2a 2f 0a 2a 20 7b 0a 09 66 6f 6e 74 2d 66 61 | /. {..font-fa| 000001e0 6d 69 6c 79 3a 20 76 65 72 64 61 6e 61 2c 20 73 |mily: verdana, s| 000001f0 61 6e 73 2d 73 65 72 69 66 3b 0a 7d 0a 0a 68 74 |ans-serif;.}..ht| 00000200 6d 6c 20 62 6f 64 79 20 7b 0a 09 6d 61 72 67 69 |ml body {..margi| 00000210 6e 3a 20 30 3b 0a 09 70 61 64 64 69 6e 67 3a 20 |n: 0;..padding: | 00000220 30 3b 0a 09 62 61 63 6b 67 72 6f 75 6e 64 3a 20 |0;..background: | 00000230 23 65 66 65 66 65 66 3b 0a 09 66 6f 6e 74 2d 73 |#efefef;..font-s| 00000240 69 7a 65 3a 20 31 32 70 78 3b 0a 09 63 6f 6c 6f |ize: 12px;..colo| 00000250 72 3a 20 23 31 65 31 65 31 65 3b 0a 7d 0a 0a 2f |r: #1e1e1e;.}../| 00000260 2a 20 50 61 67 65 20 64 69 73 70 6c 61 79 65 64 | Page displayed| 00000270 20 74 69 74 6c 65 20 61 72 65 61 20 2a 2f 0a 23 | title area /.#| 00000280 74 69 74 6c 65 73 20 7b 0a 09 6d 61 72 67 69 6e |titles {..margin| 00000290 2d 6c 65 66 74 3a 20 31 35 70 78 3b 0a 09 70 61 |-left: 15px;..pa| 000002a0 64 64 69 6e 67 3a 20 31 30 70 78 3b 0a 09 70 61 |dding: 10px;..pa| 000002b0 64 64 69 6e 67 2d 6c 65 66 74 3a 20 31 30 30 70 |dding-left: 100p| 000002c0 78 3b 0a 09 62 61 63 6b 67 72 6f 75 6e 64 3a 20 |x;..background: | 000002d0 75 72 6c 28 27 68 74 74 70 3a 2f 2f 77 77 77 2e |url('http://www.| 000002e0 73 71 75 69 64 2d 63 61 63 68 65 2e 6f 72 67 2f |squid-cache.org/| 000002f0 41 72 74 77 6f 72 6b 2f 53 4e 2e 70 6e 67 27 29 |Artwork/SN.png')| 00000300 20 6e 6f 2d 72 65 70 65 61 74 20 6c 65 66 74 3b | no-repeat left;| 00000310 0a 7d 0a 0a 2f 2a 20 69 6e 69 74 69 61 6c 20 74 |.}../ initial t| 00000320 69 74 6c 65 20 2a 2f 0a 23 74 69 74 6c 65 73 20 |itle /.#titles | 00000330 68 31 20 7b 0a 09 63 6f 6c 6f 72 3a 20 23 30 30 |h1 {..color: #00| 00000340 30 30 30 30 3b 0a 7d 0a 23 74 69 74 6c 65 73 20 |0000;.}.#titles | 00000350 68 32 20 7b 0a 09 63 6f 6c 6f 72 3a 20 23 30 30 |h2 {..color: #00| 00000360 30 30 30 30 3b 0a 7d 0a 0a 2f 2a 20 73 70 65 63 |0000;.}../ spec| 00000370 69 61 6c 20 65 76 65 6e 74 3a 20 46 54 50 20 73 |ial event: FTP s| 00000380 75 63 63 65 73 73 20 70 61 67 65 20 74 69 74 6c |uccess page titl| 00000390 65 73 20 2a 2f 0a 23 74 69 74 6c 65 73 20 66 74 |es /.#titles ft| 000003a0 70 73 75 63 63 65 73 73 20 7b 0a 09 62 61 63 6b |psuccess {..back| 000003b0 67 72 6f 75 6e 64 2d 63 6f 6c 6f 72 3a 23 30 30 |ground-color:#00| 000003c0 66 66 30 30 3b 0a 09 77 69 64 74 68 3a 31 30 30 |ff00;..width:100| 000003d0 25 3b 0a 7d 0a 0a 2f 2a 20 50 61 67 65 20 64 69 |%;.}../ Page di| 000003e0 73 70 6c 61 79 65 64 20 62 6f 64 79 20 63 6f 6e |splayed body con| 000003f0 74 65 6e 74 20 61 72 65 61 20 2a 2f 0a 23 63 6f |tent area /.#co| 00000400 6e 74 65 6e 74 20 7b 0a 09 70 61 64 64 69 6e 67 |ntent {..padding| 00000410 3a 20 31 30 70 78 3b 0a 09 62 61 63 6b 67 72 6f |: 10px;..backgro| 00000420 75 6e 64 3a 20 23 66 66 66 66 66 66 3b 0a 7d 0a |und: #ffffff;.}.| 00000430 0a 2f 2a 20 47 65 6e 65 72 61 6c 20 74 65 78 74 |./ General text| 00000440 20 2a 2f 0a 70 20 7b 0a 7d 0a 0a 2f 2a 20 65 72 | /.p {.}../ er| 00000450 72 6f 72 20 62 72 69 65 66 20 64 65 73 63 72 69 |ror brief descri| 00000460 70 74 69 6f 6e 20 2a 2f 0a 23 65 72 72 6f 72 20 |ption /.#error | 00000470 70 20 7b 0a 7d 0a 0a 2f 2a 20 73 6f 6d 65 20 64 |p {.}../ some d| 00000480 61 74 61 20 77 68 69 63 68 20 6d 61 79 20 68 61 |ata which may ha| 00000490 76 65 20 63 61 75 73 65 64 20 74 68 65 20 70 72 |ve caused the pr| 000004a0 6f 62 6c 65 6d 20 2a 2f 0a 23 64 61 74 61 20 7b |oblem /.#data {| 000004b0 0a 7d 0a 0a 2f 2a 20 74 68 65 20 65 72 72 6f 72 |.}../ the error| 000004c0 20 6d 65 73 73 61 67 65 20 72 65 63 65 69 76 65 | message receive| 000004d0 64 20 66 72 6f 6d 20 74 68 65 20 73 79 73 74 65 |d from the syste| 000004e0 6d 20 6f 72 20 6f 74 68 65 72 20 73 6f 66 74 77 |m or other softw| 000004f0 61 72 65 20 2a 2f 0a 23 73 79 73 6d 73 67 20 7b |are /.#sysmsg {| 00000500 0a 7d 0a 0a 70 72 65 20 7b 0a 20 20 20 20 66 6f |.}..pre {. fo| 00000510 6e 74 2d 66 61 6d 69 6c 79 3a 73 61 6e 73 2d 73 |nt-family:sans-s| 00000520 65 72 69 66 3b 0a 7d 0a 0a 2f 2a 20 73 70 65 63 |erif;.}../ spec| 00000530 69 61 6c 20 65 76 65 6e 74 3a 20 46 54 50 20 2f |ial event: FTP /| 00000540 20 47 6f 70 68 65 72 20 64 69 72 65 63 74 6f 72 | Gopher director| 00000550 79 20 6c 69 73 74 69 6e 67 20 2a 2f 0a 23 64 69 |y listing /.#di| 00000560 72 6d 73 67 20 7b 0a 20 20 20 20 66 6f 6e 74 2d |rmsg {. font-| 00000570 66 61 6d 69 6c 79 3a 20 63 6f 75 72 69 65 72 3b |family: courier;| 00000580 0a 20 20 20 20 63 6f 6c 6f 72 3a 20 62 6c 61 63 |. color: blac| 00000590 6b 3b 0a 20 20 20 20 66 6f 6e 74 2d 73 69 7a 65 |k;. font-size| 000005a0 3a 20 31 30 70 74 3b 0a 7d 0a 23 64 69 72 6c 69 |: 10pt;.}.#dirli| 000005b0 73 74 69 6e 67 20 7b 0a 20 20 20 20 6d 61 72 67 |sting {. marg| 000005c0 69 6e 2d 6c 65 66 74 3a 20 32 25 3b 0a 20 20 20 |in-left: 2%;. | 000005d0 20 6d 61 72 67 69 6e 2d 72 69 67 68 74 3a 20 32 | margin-right: 2| 000005e0 25 3b 0a 7d 0a 23 64 69 72 6c 69 73 74 69 6e 67 |%;.}.#dirlisting| 000005f0 20 74 72 2e 65 6e 74 72 79 20 74 64 2e 69 63 6f | tr.entry td.ico| 00000600 6e 2c 74 64 2e 66 69 6c 65 6e 61 6d 65 2c 74 64 |n,td.filename,td| 00000610 2e 73 69 7a 65 2c 74 64 2e 64 61 74 65 20 7b 0a |.size,td.date {.| 00000620 20 20 20 20 62 6f 72 64 65 72 2d 62 6f 74 74 6f | border-botto| 00000630 6d 3a 20 67 72 6f 6f 76 65 3b 0a 7d 0a 23 64 69 |m: groove;.}.#di| 00000640 72 6c 69 73 74 69 6e 67 20 74 64 2e 73 69 7a 65 |rlisting td.size| 00000650 20 7b 0a 20 20 20 20 77 69 64 74 68 3a 20 35 30 | {. width: 50| 00000660 70 78 3b 0a 20 20 20 20 74 65 78 74 2d 61 6c 69 |px;. text-ali| 00000670 67 6e 3a 20 72 69 67 68 74 3b 0a 20 20 20 20 70 |gn: right;. p| 00000680 61 64 64 69 6e 67 2d 72 69 67 68 74 3a 20 35 70 |adding-right: 5p| 00000690 78 3b 0a 7d 0a 0a 2f 2a 20 68 6f 72 69 7a 6f 6e |x;.}../ horizon| 000006a0 74 61 6c 20 6c 69 6e 65 73 20 2a 2f 0a 68 72 20 |tal lines /.hr | 000006b0 7b 0a 09 6d 61 72 67 69 6e 3a 20 30 3b 0a 7d 0a |{..margin: 0;.}.| 000006c0 0a 2f 2a 20 70 61 67 65 20 64 69 73 70 6c 61 79 |./ page display| 000006d0 65 64 20 66 6f 6f 74 65 72 20 61 72 65 61 20 2a |ed footer area *| 000006e0 2f 0a 23 66 6f 6f 74 65 72 20 7b 0a 09 66 6f 6e |/.#footer {..fon| 000006f0 74 2d 73 69 7a 65 3a 20 39 70 78 3b 0a 09 70 61 |t-size: 9px;..pa| 00000700 64 64 69 6e 67 2d 6c 65 66 74 3a 20 31 30 70 78 |dding-left: 10px| 00000710 3b 0a 7d 0a 0a 0a 62 6f 64 79 0a 3a 6c 61 6e 67 |;.}...body.:lang| 00000720 28 66 61 29 20 7b 20 64 69 72 65 63 74 69 6f 6e |(fa) { direction| 00000730 3a 20 72 74 6c 3b 20 66 6f 6e 74 2d 73 69 7a 65 |: rtl; font-size| 00000740 3a 20 31 30 30 25 3b 20 66 6f 6e 74 2d 66 61 6d |: 100%; font-fam| 00000750 69 6c 79 3a 20 54 61 68 6f 6d 61 2c 20 52 6f 79 |ily: Tahoma, Roy| 00000760 61 2c 20 73 61 6e 73 2d 73 65 72 69 66 3b 20 66 |a, sans-serif; f| 00000770 6c 6f 61 74 3a 20 72 69 67 68 74 3b 20 7d 0a 3a |loat: right; }.:| 00000780 6c 61 6e 67 28 68 65 29 20 7b 20 64 69 72 65 63 |lang(he) { direc| 00000790 74 69 6f 6e 3a 20 72 74 6c 3b 20 7d 0a 20 2d 2d |tion: rtl; }. --| 000007a0 3e 3c 2f 73 74 79 6c 65 3e 0a 3c 2f 68 65 61 64 |>.</head| 000007b0 3e 3c 62 6f 64 79 20 69 64 3d 45 52 52 5f 49 4e |><body id=ERR_IN| 000007c0 56 41 4c 49 44 5f 52 45 51 3e 0a 3c 64 69 76 20 |VALIDREQ>.<div | 000007d0 69 64 3d 22 74 69 74 6c 65 73 22 3e 0a 3c 68 31 |id="titles">.<h1| 00000850 65 73 74 3c 2f 62 3e 20 65 72 72 6f 72 20 77 61 |est error wa| 00000860 73 20 65 6e 63 6f 75 6e 74 65 72 65 64 20 77 68 |s encountered wh| 00000930 37 3b 20 6c 69 6e 75 78 3b 20 61 6d 64 36 34 29 |7; linux; amd64)| 00000940 0d 0a 43 6f 6e 74 65 6e 74 2d 4c 65 6e 67 74 68 |..Content-Length| 00000950 3a 20 32 36 31 36 39 38 31 0d 0a 41 75 74 68 6f |: 2616981..Autho| 00000960 72 69 7a 61 74 69 6f 6e 3a 20 2a 2a 20 4e 4f 54 |rization: NOT| 00000970 20 44 49 53 50 4c 41 59 45 44 20 2a 2a 0d 0a 43 | DISPLAYED ..C| 00000980 6f 6e 74 65 6e 74 2d 4d 44 35 3a 20 45 65 34 2f |ontent-MD5: Ee4/| 00000990 41 7a 72 5a 4a 6e 4d 4d 53 7a 34 31 31 71 2f 76 |AzrZJnMMSz411q/v| 00000a70 70 72 6f 62 6c 65 6d 73 20 61 72 65 3a 3c 2f 70 |problems are:</p| 00000fe0 4d 53 7a 34 31 31 71 25 32 46 76 59 77 25 33 44 |MSz411q%2FvYw%3D| 00000ff0 25 33 44 25 30 44 25 30 41 43 6f 6e 74 65 6e 74 |%3D%0D%0AContent| > caused by: expected element type but have (try 11) [cached_store.go:390] 24/08/15 03:20:27 ERROR JuiceFileSystemImpl: 2024/08/15 03:20:27.101709 juicefs[14] : upload chunk 4247266 (length: 2616981) fail: (max tries) upload block chunks/4/4247/4247266_02616981: SerializationError: failed to unmarshal error message status code: 417, request id: , host id: caused by: UnmarshalError: failed to unmarshal error message

What you expected to happen: I hope large amounts of data can be writted into the table correctly ,normally and timely with spark-sql, which store data in juicefs/minio and store metadata in mysql

Environment:

zhijian-pro commented 2 months ago

What is object storage ?

GoodJeek commented 2 months ago

minio

zhijian-pro commented 2 months ago

Is this error unrelated to the content of the written data? Does this error necessarily occur whenever the amount of data being written exceeds a certain size?

Is there any other network middleware between juicefs and minio that causes the returned data to be truncated, resulting in formatting errors that cannot be parsed?

zhijian-pro commented 1 week ago
image

Resolved, determined to be caused by the user's network set proxy.