splunk / splunk-sdk-ruby

Splunk Software Development Kit for Ruby
http://dev.splunk.com
Apache License 2.0
36 stars 21 forks source link

gba's Code Review #2

Closed ampledata closed 12 years ago

ampledata commented 12 years ago

Mostly style and layout fixes gleaned from https://github.com/bbatsov/ruby-style-guide

PLEASEREVIEW: @itay @rdas

rdas commented 12 years ago

gba - Did you do the layout stuff with some automated tool or manually? Why did it bother you so much the way it was? Thanks for the clean up, but just wondering.

ampledata commented 12 years ago

Rob, per http://eswiki.splunk.com/Style we've been trying to follow the Ruby Community Style Guide at https://github.com/bbatsov/ruby-style-guide

For the most part the style guide is a Rubified mirror of everything in Python's PEP8 and the Google's Python Style Guide: http://google-styleguide.googlecode.com/svn/trunk/pyguide.html

Many of the problems were of inconsistent style, spacing, block usage, and line length. I ran all tests before and after, all of which passed.

ampledata commented 12 years ago

I wanted to, but couldn't:

  1. Our stack is still on 1.8.7. I was going to test the SDK on 1.8.7, but I couldn't even get the tests to run, so I postponed.
  2. I needed input support, which it seems like is still missing?

In the interim I threw together a quick input Ruby implementation: https://github.com/ampledata/splunk-input-ruby

Also, lest we forget the Storm implementation: https://github.com/splunk/storm-examples/tree/master/ruby

rdas commented 12 years ago

gba - here is the reason it needs 1.9.2 (from the author of the JSON pull parser that I use). I will investigate fixing it:

Hi Rob! Splunk is a neat product, so it's exciting to hear from you. I'm so glad my little gem was useful to you guys.

The gem needs Ruby 1.9.2 because it uses the new Encoding methods to validate that a sequence of bytes is valid UTF-8. Here's the single line that needs 1.9.2:

https://github.com/dgraham/json-stream/blob/master/lib/json/stream/buffer.rb#L50

The JSON::Stream::Buffer class knows how to correctly identify and parse partial multi-byte UTF-8 byte sequences. Those sequences may be truncated due to EventMachine handing it a chunk of JSON that may end abruptly in the middle of a UTF-8 character.

But correctly identifying multi-byte UTF-8 characters does not mean that the byte sequence is valid. Unicode has some complex rules to determine if the sequence is valid and I didn't trust that I fully understood all those scenarios. So, I'm using String#force_encoding and String#valid_encoding? to do that work for me.

Here's a test case for just one of those possible invalid sequences (called an "overlong form"): https://github.com/dgraham/json-stream/blob/master/test/buffer_test.rb#L84

I'm not sure how to do this validation in Ruby 1.8.7. Maybe there's a gem or C library that implements those Unicode rules?

I hope that helps!

David

itay commented 12 years ago

Is there a reason the Ruby SDK uses JSON? JSON support is still missing from most endpoints, so we should default to XML whenever we can.

Itay

rdas commented 12 years ago

I haven't looked at it for over 2 months, but I believe it only uses JSON for parsing search results. I chose to do this instead of parsing XML as a stream because I wanted to get it up and running quicker and after talking to the core folks, the only reason to use XML for search results was to get hold of the timeline data.

By the way, I believe that I can simply remove the offending line that pins it to ruby 1.9 because we have well known JSON replies that don't contain bogus UTF-8 data.

Rob


From: Itay Neeman [reply@reply.github.com] Sent: Tuesday, April 03, 2012 10:15 AM To: Rob Das Subject: Re: [splunk-sdk-ruby] gba's Code Review (#2)

Is there a reason the Ruby SDK uses JSON? JSON support is still missing from most endpoints, so we should default to XML whenever we can.

Itay


Reply to this email directly or view it on GitHub: https://github.com/splunk/splunk-sdk-ruby/pull/2#issuecomment-4913520