Open xkrogen opened 1 week ago
cc @electrum @weijiii @findepi
cc @pajaks
Bump @findepi @pajaks @electrum
If it's helpful to guide the discussion I can put together a PR for this following the proposed approach, but I would love feedback before that to avoid myself any duplicate work in the wrong direction 😄
When using a data lake connector (Hive/Iceberg/etc.) to write data to HDFS using Trino, we may see a QuotaExceededException (e.g. namespace quota or disk space quota exceeded). This is a userspace issue, but currently we categorize it as an
EXTERNAL
error.I would like to work on a fix for this, but have a couple of things I'd like to discuss before moving forward:
My initial approach would be to modify
HdfsOutputFile
to add a new catch block here forQuotaExceededException
, which will throw aTrinoException
with typeUSER_ERROR
: https://github.com/trinodb/trino/blob/6697fe24481a30d37eb91efd62666165acf379c2/lib/trino-hdfs/src/main/java/io/trino/filesystem/hdfs/HdfsOutputFile.java#L102-L105 (I think we would need to modifyHdfsOutputStream#write()
as well to catch disk space quota issues specifically, but need to double-check.) However I noticed that there is only one place in all of the varioustrino-filesystem-*
/trino-hdfs
modules where we throw aTrinoException
, so I am wondering if there is a different best-practice for surfacing this kind of issue from the FS layer?Are there analogous concepts on other blob stores (S3/GCS/Azure) that we should handle similarly?
As one example, when writing ORC data from Hive or Iceberg using
OrcFileWriterFactory
(handled here) orIcebergFileWriterFactory
(handled here), you get an error with type(HIVE|ICEBERG)_WRITER_OPEN_ERROR)
and message of simply "Error creating ORC file", which makes it challenging for the end user to understand that there is something on their end to correct (quota issue).