graylog-labs / graylog2-web-interface

[DEPRECATED]
https://www.graylog.org/
612 stars 174 forks source link

CSV export is truncated #1478

Closed pcmodem closed 9 years ago

pcmodem commented 9 years ago

Greylog server and web-interface 1.1.0

When I attempt to output the search results from a stream to a .csv I get this result in the .csv {"type":"ApiError" message:"Not allowed to search with filter: [streams:]"}

Additionally, when search results have a large output, the resulting .csv file is inconsistent. I was able to get a 134k output to work, but a 435k line output resulted in 220k lines in the .csv

No errors reported in graylog-server.log

kroepke commented 9 years ago

The empty streams filter sounds like a bug, we'll investigate.

I have heard about problems downloading large datasets before, but have not been able to analyze the issue yet. How many Elasticsearch nodes do you have in your cluster?

pcmodem commented 9 years ago

We have one server running one node of elasticsearch as far as I know. In the previous version we couldn't get even a decent size csv output (1.0.0) in this version I can get it to dump a large amount of records - just not the amount of the query.

thank you!

On Fri, Jun 12, 2015 at 3:43 AM, Kay Roepke notifications@github.com wrote:

The empty streams filter sounds like a bug, we'll investigate.

I have heard about problems downloading large datasets before, but have not been able to analyze the issue yet. How many Elasticsearch nodes do you have in your cluster?

— Reply to this email directly or view it on GitHub https://github.com/Graylog2/graylog2-web-interface/issues/1478#issuecomment-111396347 .

kroepke commented 9 years ago

1.0 was very limited in the Web UI, because we didn't use chunked encoding all the way to the browser. 1.1 has fixed this, but apparently there is a problem with larger datasets.

I suspect there might be a timeout kicking in, but could no reproduce it yet.

pcmodem commented 9 years ago

I did not observe any errors in the elasticsearch log while outputting the csv either. If there are any specific tests I can do to help narrow it down for you please let me know.

pcmodem commented 9 years ago

A little bit more data - I can consistantly get about 140k lines, regardless of how many columns, to export. Any number over that and it seems to be a timeout related problem as you state. I can get as many as 270k lines to export, but may take multiple attempts - sometimes I'll get half, sometimes i'll get 90% and sometimes i'll get all requested lines.

joschi commented 9 years ago

The permission issue has been fixed for Graylog 1.1.3 and 1.2.0. The issue with the timeout(?) is still open.

ck011028 commented 9 years ago

Hi I was able to reproduce this CSV truncation with large datasets on the latest 1.1.6. It looks like there is a default timeout of 61 seconds when retrieving data from the graylog rest service. After 61 seconds the CSV becomes truncated. I was hoping that timeout.DEFAULT config would control this but it doesn't appear so. Is there possibly another config that we could use to increase timeout?

java.io.IOException: java.util.concurrent.TimeoutException: Request timed out to /127.0.0.1:12900 of 61000 ms
        at org.graylog2.restclient.lib.AsyncByteBufferInputStream.read(AsyncByteBufferInputStream.java:61)
        at java.io.InputStream.read(InputStream.java:170)
        at java.io.InputStream.read(InputStream.java:101)
        at play.api.libs.iteratee.Enumerator$$anonfun$fromStream$2$$anonfun$1.apply$mcI$sp(Enumerator.scala:560)
        at play.api.libs.iteratee.Enumerator$$anonfun$fromStream$2$$anonfun$1.apply(Enumerator.scala:560)
        at play.api.libs.iteratee.Enumerator$$anonfun$fromStream$2$$anonfun$1.apply(Enumerator.scala:560)
        at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
        at scala.concurrent.package$.blocking(package.scala:50)
        at play.api.libs.iteratee.Enumerator$$anonfun$fromStream$2.apply(Enumerator.scala:560)
        at play.api.libs.iteratee.Enumerator$$anonfun$fromStream$2.apply(Enumerator.scala:558)
        at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
        at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
        at scala.concurrent.forkjoin.ForkJoinTask$AdaptedRunnableAction.exec(ForkJoinTask.java:1361)
        at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
joschi commented 9 years ago

The timeout of 61 seconds for requests from the Graylog web interface is hard-coded in Graylog 1.1.6 and earlier. If you want to export a very large dataset, that takes more than 61 seconds to process, I recommend using the Graylog REST API directly, which does not have such a timeout.

kroepke commented 9 years ago

I cannot reproduce the problem with the CSV export in streams, it works for me. Please file another bug report in case the problem persists in 1.2.0-rc.1

Thank you.