simlaudato / asterixdb

Automatically exported from code.google.com/p/asterixdb
0 stars 0 forks source link

Load a delimited-text containing a record longer than 8192 from HDFS failed #901

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1.Put the attached file lineitem.tbl onto HDFS
2.Run the attached query hdfs_long.1.ddl.aql

What is the expected output? What do you see instead?
It should succeed. But exception was shown as below:

Caused by: edu.uci.ics.hyracks.api.exceptions.HyracksDataException: 
java.io.IOException: Underlying input stream returned zero bytes
    at edu.uci.ics.asterix.runtime.operators.file.AbstractTupleParser.parse(AbstractTupleParser.java:84)
    at edu.uci.ics.asterix.external.dataset.adapter.FileSystemBasedAdapter.start(FileSystemBasedAdapter.java:54)
    at edu.uci.ics.asterix.metadata.feeds.ExternalDataScanOperatorDescriptor$1.initialize(ExternalDataScanOperatorDescriptor.java:57)
    ... 5 more
Caused by: java.io.IOException: Underlying input stream returned zero bytes
    at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:287)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
    at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
    at java.io.InputStreamReader.read(InputStreamReader.java:184)
    at edu.uci.ics.hyracks.dataflow.std.file.FieldCursorForDelimitedDataParser.readMore(FieldCursorForDelimitedDataParser.java:332)
    at edu.uci.ics.hyracks.dataflow.std.file.FieldCursorForDelimitedDataParser.nextRecord(FieldCursorForDelimitedDataParser.java:83)
    at edu.uci.ics.asterix.runtime.operators.file.DelimitedDataParser.parse(DelimitedDataParser.java:112)
    at edu.uci.ics.asterix.runtime.operators.file.AbstractTupleParser.parse(AbstractTupleParser.java:72)
    ... 7 more

Please use labels and text to provide additional information.

It only happens when load from HDFS, if I switch to localfs everything will run 
successfully.

Original issue reported on code.google.com by jianfeng...@gmail.com on 17 Jun 2015 at 7:35

Attachments:

GoogleCodeExporter commented 8 years ago

Original comment by jianfeng...@gmail.com on 17 Jun 2015 at 5:47

GoogleCodeExporter commented 8 years ago
I looked at this and it only happens with the delimited data parser. 
When reading from local file, the reader returns the bytes as they are read, 
while when reading from HDFS, if the buffer is not enough, the reader returns 
0. In the case of ADM parser, the parser double the buffer size and call the 
read function again while in the case of the local file reader, an exception is 
thrown.

It is easy to fix and will fix it very soon.

Original comment by bamou...@gmail.com on 19 Jun 2015 at 7:03

GoogleCodeExporter commented 8 years ago
Correction:
It fails for both ADM and delimited files since they both use InputStreamReader 
which doesn't allow returning of zero bytes.

Original comment by bamou...@gmail.com on 19 Jun 2015 at 7:17