thinkaurelius / titan

Distributed Graph Database
http://titandb.io
Apache License 2.0
5.24k stars 1.01k forks source link

Ellipse shows in Console for 15 items #924

Closed spmallette closed 9 years ago

spmallette commented 9 years ago

I saw this in 0.5.2, but @dalaro confirmed that it was in 0.5.3:

gremlin> [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
==>1
==>2
==>3
==>4
==>5
==>6
==>7
==>8
==>9
==>10
==>11
==>12
==>13
==>14
==>15
==>...
gremlin> [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16]
==>1
==>2
==>3
==>4
==>5
==>6
==>7
==>8
==>9
==>10
==>11
==>12
==>13
==>14
==>15
==>16
gremlin> [1,2,3,4,5,6,7,8,9,10,11,12,13,14]
==>1
==>2
==>3
==>4
==>5
==>6
==>7
==>8
==>9
==>10
==>11
==>12
==>13
==>14

Unsure if there are other list sizes that pop in the ellipse or not.

dalaro commented 9 years ago

645b406468f06c043b810375d6203618b505b241 fixes the fencepost bug and makes the limit apply to all collections, so that [1,...16] only prints up to 15 and then ....

Since limiting non-titan-hadoop collection previews is new behavior, I made the default 15-element limit into a CLI option. gremlin.sh -w <number> controls the number of elements printed in a result collection. The hope is that people who are annoyed by this limit can still increase it with a concise argument, as opposed to having to go through the hassle of tweaking a compile-time constant and rebuilding. I have this commit locally at the moment; don't want to paste a hash yet in case I rebase.

The -w option should be considered subject to change or removal. It's not in use by the TP3 gremlin.sh, but it's still not clear whether it's worth keeping yet.

spmallette commented 9 years ago

so all results whether they come from titan-hadoop or just (0..100) stop at 15 by default...is that right?

dalaro commented 9 years ago

That's the current behavior.

The commit mentioned in my last comment without a hash is now pushed as 5298f5d923fd2fbf276b40d7c20d4d56fd2f0621. Thinking some more about this, a gremlin.sh option is just the wrong way to do this. It works, but it's not very usable. Maybe a set/:set preference is the way to go. I think I see how to hook into that on 1.8.9 using org.codehaus.groovy.tools.shell.util.Preferences.addChangeListener. I don't see existing uses of that in TP2 or TP3 though, so I wonder if I'm making a mistake.

dalaro commented 9 years ago

Also: that default value (15) is an old constant that's been in Console forever. I'm not attached to it. Suggestions welcome.

spmallette commented 9 years ago

yeah - i was waiting for your response to be clear before i mentioned that i kinda didn't like the gremlin.sh way.

we haven't used the "preference" system heavily in TP2/3, but i don't think that's a bad piece of the infrastructure to leverage.

I'd prefer a larger default too - 15 is fine for a pure "faunus" job, but less awesome when you're doing other stuff. I think it's more of a burden to non-faunus stuff to have the small size than the other way around. I guess the main downside is that the hadoop output gets shoved off the screen if you make that too big a number. Maybe 50 wouldn't be too crazy a default for both? @dkuppitz any thoughts?

dalaro commented 9 years ago

In titan05, it would be possible to have two distinct limits, one for MapReduce result file head'ing and another for iterating anything else, but that's sort of a consequence of an abstraction leak. Does TP3 have the same thing? It looks like only one limit would make sense for Console.handleResultIterate https://github.com/tinkerpop/tinkerpop3/blob/master/gremlin-console/src/main/groovy/com/tinkerpop/gremlin/console/Console.groovy#L122-L183. If that's true then I'll stick to one limit in titan05.

50 doesn't sound crazy. I would also immediately accept any value proposed by @dkuppitz.

spmallette commented 9 years ago

We don't have any notion of limits in TP3 and at the moment we don't have such distinction for different options for limits. i'll create an issue for discussion over there.

dkuppitz commented 9 years ago

If we can distinguish between OLAP and OLTP results, then I would prefer to have a very high limit (1000+) or no limit at all for OLTP. Maybe a high limit is the better choice for those who are crazy enough to submit g.V().valueMap() on a huge graph. But usually I don't want to see my OLTP results truncated.

However, no matter if we can distinguish between those result types or not, I like the idea to use :set to set a custom limit.

As a default limit for all kinds of results I would use even more than 50. Maybe 512? That's the default number of scrollback lines in a terminal window.

dalaro commented 9 years ago

Given your feedback, I'll separate them. What should the separate defaults be for OLAP and OLTP? For OLTP, I do like the idea of a very high default as opposed to infinity for the same reason you mentioned.

dkuppitz commented 9 years ago

I think 50 is absolutely okay for OLAP. In most cases you'll have your OLAP result stored in HDFS (the truncated data is not lost), thus you don't have to execute the traversal again just to see more data.

dalaro commented 9 years ago

50 for OLAP is fine with me. 512 for OLTP then?

dkuppitz commented 9 years ago

Yep, that should work in almost all cases.

okram commented 9 years ago

Huh. This solves the CNTRL-C problem.. eh?

https://github.com/tinkerpop/tinkerpop3/issues/11

dkuppitz commented 9 years ago

I don't think so. The console will probable never get a response from Titan / Cassandra if you send g.V().valueMap() to a 1B+ vertices graph.

dkuppitz commented 9 years ago

Thought about it a bit longer. Sometimes the first entry is the most important one, even in large result sets (for example when you want to check the entry order). That said, it's probably better to make the default limit 500. This way you can still scroll up and see your query + the first result entries (with default terminal settings).

dalaro commented 9 years ago

@okram tinkerpop/tinkerpop3#11 is still useful. Aside from iterators that precompute or for which building each element is unexpectedly costly, some stuff just blocks inconveniently for a long time (e.g. network unreachable). The latter is harder to deal with though, since it implies that the console would be multithreaded with the main thread cautiously avoiding blocking on anything except stdin, so that it can promptly process the breakout key sequence and immediately interrupt the background worker. Signals seem like they would help do this without multithreading, but IIRC, there's no tidy and standard way to handle a ctrl-c SIGINT at the app layer aside from a shutdown hook run right before the JVM dies, which is point-defeating. I mean, there might be something in com.sun, but yuck.

@dkuppitz I didn't mean to limit you to a high value. I did write "I do like the idea of a very high default" above, but only as an alternative to infinity.

dkuppitz commented 9 years ago

Yea, so we both like the idea of a high default value :). 500 is cool.

:set result.rows.limit -1 => infinity

spmallette commented 9 years ago

I'm still not so sure ctrl-c is even possible given our dependence on groovysh. i've been through that code a few times now and i've never seen where we can hook into something to interrupt an evaluation.

dalaro commented 9 years ago

71bd1d7 implements what we just discussed.

The only bit I dislike is that, insofar as we attach to set, we seem to be stuck in a flat namespace shared with general Groovysh preferences. The current implementation will end up writing a file like this:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE map SYSTEM "http://java.sun.com/dtd/preferences.dtd">
<map MAP_XML_VERSION="1.0">
  <entry key="tp-olap-result-peek" value="13"/>
  <entry key="tp-oltp-result-peek" value="-1"/>
</map>

with a path like ~/.java/.userPrefs/org/codehaus/groovy/tools/shell/prefs.xml (on Linux at least).

Preferences is inherently hierarchical. It's appealing to instead have our own namespace with a separate prefs.xml file, so that we could rule out key collisions. I couldn't figure out how to do this with the set feature though. It appears to just wire up to a Preferences.put call.

I tried putting slashes in the put key, wondering if that would be interpreted as Preference node hierarchy, but that doesn't appear to be the case. Keys with slashes are passed through without any apparent interpretation. The result is a file like this:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE map SYSTEM "http://java.sun.com/dtd/preferences.dtd">
<map MAP_XML_VERSION="1.0">
  <entry key="tinkerpop/olap-result-peek" value="13"/>
  <entry key="tinkerpop/oltp-result-peek" value="-1"/>
</map>

at the same .../shell/prefs.xml path mentioned above. I think we would need to call something like <Preferences parent object>.node("tinkerpop").put("olap-result-peek", "13"), but I don't see how to do that within the limitations of set. Maybe we could register a command -- perhaps one that overrides the default set behavior -- but maybe that just leads to crazytown.

spmallette commented 9 years ago

Want to hear some real crazytown? I secretly often feel like writing our own console :smiling_imp: Anyway, I would say that what you have is probably good enough to cope with the scope of this issue. We have this issue referenced in TP3 at this point so hopefully we can look at what you have here and determine the direction to go for the future.

dalaro commented 9 years ago

A rewrite is tempting... I'm not volunteering though :chicken:

Should I reimplement this on TP3 as a PR for tinkerpop/tinkerpop3#556, or would you rather handle that separately?

spmallette commented 9 years ago

Been thinking about this one - let's wait on an PR for this - I've referenced this issue over in TP3 so we can get back to this issue. This might yet be how we want to do it, but I'd like to take more time to make that decision. Thanks.

dalaro commented 9 years ago

Thanks for following up. This feature shipped in Titan 0.5.4. Since the linked TP3 issue is pending consideration and since there are no next steps here, I'm going to close.