google / riegeli

Riegeli/records is a file format for storing a sequence of string records, typically serialized protocol buffers.
Apache License 2.0
418 stars 53 forks source link

Implementing a custom Writer / Reader #5

Closed CodeArno closed 5 years ago

CodeArno commented 5 years ago

Hi,

I wanted to implement a streaming writer/reader that can directly interface with Amazon S3. Looking through the code, it seemed like I should just be able to extend BufferedWriter and implement some of its methods.

I created an implementation and overwrote the pure virtual Flush and WriteInternal methods, but it seems that other than the constructor, these methods never get called. When calling Done on the RecordWriter, I get a segfault:

*** Segmentation fault (@0xc0) received by PID 22734 (TID 0x7f3eeee419c0) from PID 192; stack trace: ***
    @     0x7f3efbf76e11 common_internal::signalHandler()
    @     0x7f3ef3c47390 (unknown)
    @     0x7f3ef53b12e7 riegeli::RecordWriterBase::Done()
    @     0x5595de55274f riegeli::RecordWriter<>::Done()
    @     0x5595de551899 run()
    @     0x5595de55077b main
    @     0x7f3ef29c9830 __libc_start_main
    @     0x5595de550909 _start

I tried to understand how the FdWriter does the implementation, but couldn't figure it out. Can you provide some pointers what I need to implement in my custom Writer to make it work?

Thanks! Arno

QrczakMK commented 5 years ago

The error indicates dereferencing a null pointer and accessing its field at offset 0xc0.

Depending on how RecordWriter template is instantiated and which constructor of RecordWriter is called, you might need move constructor and move assignment. Compiler-generated ones might be correct or not, depending on member variables of your class.

You likely need to also override Done (and call BufferedWriter::Done() in it), assuming that the stream needs to be closed somehow and that closing it might fail.

FdWriter might be hard to follow because it derives from FdWriterBase (for functionality common between FdWriter template instantiations) and FdWriterCommon (for functionality common between FdWriterBase and FdStreamWriterBase). If you do not need a configurable policy regarding owning the underlying Amazon S3 stream, your class does not have to be a template, so it would look more like FdWriter + FdWriterBase + FdWriterCommon flattened. Other aspects of FdWriter (like dest_fd, dest, Reset) are provided for completeness or efficiency and do not need to be emulated.

If you can show the code, I could try to find the error.

CodeArno commented 5 years ago

Thank you for your quick response! My code is here: https://gist.github.com/CodeArno/d922d50de4bd3624b8750fc2b717fc9d Other than the line "Initialize for..." from initialize() nothing gets logged, so it doesn't seem like any of my other methods ever get called.

QrczakMK commented 5 years ago

The appropriate base class constructor must be called:

RiegeliS3Writer::RiegeliS3Writer(
    Aws::S3::S3Client* s3_client, const std::string& bucket_name, const std::string& key_path)
    : riegeli::BufferedWriter(riegeli::kDefaultBufferSize),
      s3_client_(s3_client), bucket_name_(bucket_name), key_path_(key_path) {
  initialize();
}

The default constructor constructs a dummy closed object.

I will think what to do to make it more clear or to let it fail in a more obvious way.

CodeArno commented 5 years ago

That did it - everything works now. Thank you so much!

QrczakMK commented 5 years ago

https://github.com/google/riegeli/commit/6d4d59c6e448a293f7cb3ae9fc3bcdac158f7e75 should make this a tiny bit more helpful.