take-cheeze / mruby-marshal

mruby implementation of cruby marshaling.
12 stars 9 forks source link

Replace IO#seek with IO#ungetc #31

Closed krissi closed 4 years ago

krissi commented 4 years ago

It would be great if you could replace the seek in restore_byte by ungetc. This would make it possible to read directly from STDIN, Sockets, etc (STDIN.seek(-1) #=> RuntimeError). It is also possible to reuse the IO for multiple objects.

For example, with MRI:

ruby -e 'Marshal.dump("foo", STDOUT); Marshal.dump("bar", STDOUT)' | ruby -e 'loop { p Marshal.load(STDIN) }'
"foo"
"bar"
Traceback (most recent call last):
    3: from -e:1:in `<main>'
    2: from -e:1:in `loop'
    1: from -e:1:in `block in <main>'
-e:1:in `load': end of file reached (EOFError)

With this changes below, this would be possible with mruby as well:

mruby/bin/mruby -e 'Marshal.dump("foo", STDOUT); Marshal.dump("bar", STDOUT)' | mruby/bin/mruby -e 'loop { p Marshal.load(STDIN) }'      
"foo"
"bar"
trace (most recent call last):
    [1] -e:1
-e:1: invalid marshal version: 0.0 (expected: 4.8) (TypeError)

I am not sure why MRI raises an EOFError which is what I expected, but mruby continues parsing marshal data.

My changes (sorry, I dont really know any CPP):

diff --git a/src/marshal.cpp b/src/marshal.cpp
index 6024f97..3edd146 100644
--- a/src/marshal.cpp
+++ b/src/marshal.cpp
@@ -480,7 +480,7 @@ mrb_value read_context<In>::marshal() {

     case ':': // symbol
     case ';': // symbol link
-      in_.restore_byte(); // restore tag
+      in_.restore_byte(tag); // restore tag
       return mrb_symbol_value(symbol());

     case 'I': { // instance variable
@@ -662,7 +662,7 @@ struct string_in {
     return *(current++);
   }

-  void restore_byte() { --current; }
+  void restore_byte(uint8_t byte) { --current; }

   mrb_value byte_array(size_t len) {
     if((current + len) > end) {
@@ -687,10 +687,12 @@ struct io_in {
     return RSTRING_PTR(buf)[0];
   }

-  void restore_byte() {
-    mrb_funcall(M, io, "seek", 2, mrb_fixnum_value(-1),
-                mrb_const_get(M, mrb_obj_value(mrb_class_get(M, "IO")),
-                              mrb_intern_lit(M, "SEEK_CUR")));
+  void restore_byte(uint8_t byte) {
+    if(byte == ';') {
+      mrb_funcall(M, io, "ungetc", 1, mrb_str_new(M, ";", 1));
+    } else {
+      mrb_funcall(M, io, "ungetc", 1, mrb_str_new(M, ":", 1));
+    }
   }

   mrb_value byte_array(size_t len) {