twitter-archive / snowflake

Snowflake is a network service for generating unique ID numbers at high scale with some simple guarantees.
http://twitter.com/
7.67k stars 1.14k forks source link

Snowflake server crashes if client send malform request #4

Closed LNGi closed 12 years ago

LNGi commented 13 years ago

I found that if I telnet to the snowflake server and send some malform data, the server will crash. I think that's unacceptable to me. Think about this: If one client not works as expected and send an invalid request for an unknown reason, then the server crashes, and all the clients will stop working, that will be a disaster.

Sending a malform request:

[liang@api: snowflake]$ telnet localhost 7609
Trying ::1...
Connected to localhost.
Escape character is '^]'.
this is a bad request

Connection closed by foreign host.

This is the crash message:

2011-05-10T20:29:11.224+0900: 63.620: [GC 63.620: [ParNew: 86017K->908K(716800K), 0.0053158 secs]63.625: [CMS: 0K->899K(102400K), 0.0401349 secs] 86017K->899K(819200K), [CMS Perm : 11118K->11094K(16384K)], 0.0455748 secs] [Times: user=0.05 sys=0.00, real=0.05 secs] 
2011-05-10T20:29:11.270+0900: 63.665: [Full GC 63.665: [CMS[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor1]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor3]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor2]
: 899K->765K(102400K), 0.0365369 secs] 899K->765K(819200K), [CMS Perm : 11094K->11072K(18492K)], 0.0365862 secs] [Times: user=0.04 sys=0.00, real=0.03 secs] 
ERROR [Thread-1] (TNonblockingServer.java312) - run() exiting due to uncaught error
java.lang.OutOfMemoryError: Java heap space
    at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39)
    at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
    at org.apache.thrift.server.TNonblockingServer$FrameBuffer.read(TNonblockingServer.java:556)
    at org.apache.thrift.server.TNonblockingServer$SelectThread.handleRead(TNonblockingServer.java:424)
    at org.apache.thrift.server.TNonblockingServer$SelectThread.select(TNonblockingServer.java:369)
    at org.apache.thrift.server.TNonblockingServer$SelectThread.run(TNonblockingServer.java:308)

My machine:

[liang@api: snowflake]$ thrift -version
Thrift version 0.5.0
[liang@api: snowflake]$ scala -version
Scala code runner version 2.8.1.final -- Copyright 2002-2010, LAMP/EPFL
[liang@api: snowflake]$ uname -a
Darwin localhost 10.0.0 Darwin Kernel Version 10.0.0: Mon Oct 12 04:06:05 AST 2009; anappirtrvh:xnu-1456.1.26/BUILD/obj/RELEASE_I386 i386
[liang@api: snowflake]$ java -version
java version "1.6.0_15"
Java(TM) SE Runtime Environment (build 1.6.0_15-b03-219)
Java HotSpot(TM) Client VM (build 14.1-b02-90, mixed mode)
ryanking commented 13 years ago

I can't reproduce this, but it seems like this is a bug in thrift, not snowflake.

stevej commented 13 years ago

This is a known bug in thrift.

stevej commented 13 years ago

It looks like this issue with thrift has been fixed: https://issues.apache.org/jira/browse/THRIFT-601

ryanking commented 13 years ago

How did he reproduce it then?

Related: I found another problem related to this that effects HsHaServer. Will follow up with the thrift devs.

stevej commented 13 years ago

can you build snowflake with thrift 0.2.0? that was before the fix (0.3 according to the bug)

ryanking commented 13 years ago

I doubt you can. It requires 0.5.0 as a dependency.

LNGi commented 13 years ago

too bad I'm not good at java, so I'm may not provide more details about this error...

but I think Ican describe how I get snowflake running and how I got this error, hope this help.

  1. clone snowflake master from github
  2. install sbt 0.7.4 via Homebrew (a package manager for mac os x)
  3. download thrift 0.5.0 from apache.org and install it from source:

    ./configure --with-erlang=no  --with-python=no  --with-haskell=no
  4. running sbt

    [liang@api: snowflake]$ sbt
    > debug
    > update
    > compile
    > package
  5. install and starting up zookeeper
  6. starting up snowflake manually:

    java -server -XX:+UseConcMarkSweepGC -verbosegc \
    -XX:+PrintGCDetails \
    -XX:+PrintGCTimeStamps \
    -XX:+PrintGCDateStamps \
    -XX:+UseParNewGC \
    -Xloggc:/var/log/snowflake/gc.log \
    -Dcom.sun.management.jmxremote \
    -Dcom.sun.management.jmxremote.port=9998 \
    -Dcom.sun.management.jmxremote.authenticate=false \
    -Dcom.sun.management.jmxremote.ssl=false \
    -Xmx700m -Xms700m -Xmn500m \
    -XX:ErrorFile=/var/log/snowflake/java_error%p.log \
    -cp /Users/liang/Projects/snowflake/build/snowflake-1.0.jar:./project/boot/scala-2.8.1/lib/scala-library.jar:./lib_managed/compile/configgy-2.0.1.jar:./lib_managed/compile/zookeeper-client-2.0.0.jar:./libs/zookeeper-3.3.1.jar:./lib_managed/compile/libthrift-0.5.0.jar:./lib_managed/compile/ostrich-4.0.1.jar::./lib_managed/compile/avalon-framework-4.1.3.jar:./lib_managed/compile/commons-codec-1.4.jar:./lib_managed/compile/commons-lang-2.2.jar:./lib_managed/compile/commons-logging-1.1.jar:./lib_managed/compile/commons-pool-1.5.4.jar:./lib_managed/compile/configgy-2.0.1.jar:./lib_managed/compile/json_2.8.0-2.1.4.jar:./lib_managed/compile/json_2.8.1-2.1.6.jar:./lib_managed/compile/libthrift-0.5.0.jar:./lib_managed/compile/log4j-1.2.14.jar:./lib_managed/compile/logkit-1.0.1.jar:./lib_managed/compile/netty-3.2.3.Final.jar:./lib_managed/compile/ostrich-4.0.1.jar:./lib_managed/compile/scala-compiler-2.8.1.jar:./lib_managed/compile/scala-library-2.8.1.jar:./lib_managed/compile/servlet-api-2.3.jar:./lib_managed/compile/slf4j-api-1.5.8.jar:./lib_managed/compile/slf4j-log4j12-1.5.8.jar:./lib_managed/compile/slf4j-nop-1.5.8.jar:./lib_managed/compile/specs_2.8.0-1.6.5.jar:./lib_managed/compile/util-core-1.8.1.jar:./lib_managed/compile/util-eval-1.8.1.jar:./lib_managed/compile/util-logging-1.8.1.jar:./lib_managed/compile/zookeeper-client-2.0.0.jar:./config/ com.twitter.service.snowflake.SnowflakeServer
  7. running test script

    [liang@api: snowflake]$ RUBYLIB=./target/gen-rb ./src/scripts/client_test.rb 10 "localhost:7609" test
    "localhost"
    "7609"
    68160518203899904 test 0
    68160518203899905 test 0
    68160518208094208 test 0
    68160518208094209 test 0
    68160518212288512 test 0
    68160518212288513 test 0
    68160518216482816 test 0
    68160518216482817 test 0
    68160518216482818 test 0
    68160518220677120 test 0

so far so good. but, if I run "src/scripts/json_stats_fetcher.rb" now, the server will crashes.

[liang@api: snowflake]$ RUBYLIB=./target/gen-rb ./src/scripts/json_stats_fetcher.rb
...
...seems the script hangs

the server will output OOM Error and just hangs there:

ERROR [Thread-1] (TNonblockingServer.java312) - run() exiting due to uncaught error
java.lang.OutOfMemoryError: Java heap space
...
...

soon I found if I send any malform data to the server, it will also crash the server.

[liang@api: snowflake]$ echo "hello"|nc localhost 7609

one more detail: after I compiled thrift 0.5.0, I run "make install" to install it, but I got this error:

Making install in test
make[4]: Nothing to be done for `install-exec-am'.
make[4]: Nothing to be done for `install-data-am'.
Making install in java
/usr/bin/ant 
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/tools/ant/launch/Launcher
Caused by: java.lang.ClassNotFoundException: org.apache.tools.ant.launch.Launcher
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:319)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:330)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:254)
    at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:399)
make[2]: *** [all-local] Error 1
make[1]: *** [install-recursive] Error 1
make: *** [install-recursive] Error 1

so I cd to the java directory and run "ant install", this time everything goes well:

[liang@api: thrift-0.5.0]$ cd lib/java/
[liang@api: java]$ ant install
Buildfile: /Users/liang/Downloads/thrift-0.5.0/lib/java/build.xml

init:

ivy-init-dirs:

...
...

  [javadoc] Generating /Users/liang/Downloads/thrift-0.5.0/lib/java/build/javadoc/stylesheet.css...
  [javadoc] 10 warnings

install:
     [copy] Copying 124 files to /Users/liang/Downloads/thrift-0.5.0/lib/java/${install.javadoc.path}

BUILD SUCCESSFUL
Total time: 4 seconds
LNGi commented 13 years ago

Sorry about close this issue by accident...

ryanking commented 12 years ago

json_stats_fetching isn't working because its trying to use the wrong port. We no longer have use for that script so I'm going to get rid of it.

As for building thrift, I can't really help you with that. Ask the thrift developers for help.

And about the crash from the garbage data, I still can't reproduce that.