Closed spinningarrow closed 9 years ago
Is it a private repo? We need to figure out a way how we could reproduce this.
I believe it’s a jgit problem. I’ve seen guys from JetBrains reported something similar, so they had to switch to calling regular git in external process. I think we should incorporate the same, especially because we also need shallow clones which are not supported by jgit.
On Mon Feb 23 2015 at 13:54:35 GMT+6 Andrey Vasenin < notifications@github.com> wrote:
Is it a private repo? We need to figure out a way how we could reproduce this.
— Reply to this email directly or view it on GitHub https://github.com/someteam/acha/issues/17#issuecomment-75501515.
According to issue reporter he met OOM during scanning phase. We load commit diff between text files into memory on scanning phase to analyse it. It definitely could cause OOM in case when diff is rly big.
OOM when we build diffs to analyse it. We store them in memory. Need to think how we could figure out this.
22:14:44.329 [ INFO ] worker#1 : a.dispatcher - Fetching/cloning repo https://github.com/bobrik/pupergrep
Exception in thread "worker#3" java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:57)
at java.nio.CharBuffer.allocate(CharBuffer.java:331)
at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:777)
at org.eclipse.jgit.util.RawParseUtils.decode(RawParseUtils.java:1009)
at org.eclipse.jgit.util.RawParseUtils.decodeNoFallback(RawParseUtils.java:950)
at org.eclipse.jgit.util.RawParseUtils.decode(RawParseUtils.java:909)
at org.eclipse.jgit.util.RawParseUtils.decode(RawParseUtils.java:868)
at org.eclipse.jgit.diff.RawText.decode(RawText.java:207)
at org.eclipse.jgit.diff.RawText.getString(RawText.java:190)
at org.eclipse.jgit.diff.RawText.getString(RawText.java:166)
at acha.git_parser$parse_edit_list$fn__6314$fn__6317.invoke(git_parser.clj:96)
at clojure.core$mapv$fn__6689.invoke(core.clj:6611)
at clojure.lang.ArrayChunk.reduce(ArrayChunk.java:58)
at clojure.core.protocols$fn__6465.invoke(protocols.clj:103)
at clojure.core.protocols$fn__6427$G__6422__6436.invoke(protocols.clj:19)
at clojure.core.protocols$seq_reduce.invoke(protocols.clj:31)
at clojure.core.protocols$fn__6448.invoke(protocols.clj:59)
at clojure.core.protocols$fn__6401$G__6396__6414.invoke(protocols.clj:13)
at clojure.core$reduce.invoke(core.clj:6514)
at clojure.core$mapv.invoke(core.clj:6611)
at acha.git_parser$parse_edit_list$fn__6314.invoke(git_parser.clj:96)
at clojure.core$mapv$fn__6689.invoke(core.clj:6611)
at clojure.core.protocols$fn__6444.invoke(protocols.clj:84)
at clojure.core.protocols$fn__6401$G__6396__6414.invoke(protocols.clj:13)
at clojure.core$reduce.invoke(core.clj:6514)
at clojure.core$mapv.invoke(core.clj:6611)
at acha.git_parser$parse_edit_list.invoke(git_parser.clj:94)
at acha.git_parser$parse_diff_changes.invoke(git_parser.clj:153)
at acha.git_parser$parse_diff_entry.invoke(git_parser.clj:158)
at clojure.core$partial$fn__4508.invoke(core.clj:2489)
at clojure.core$mapv$fn__6689.invoke(core.clj:6611)
at clojure.core.protocols$fn__6444.invoke(protocols.clj:84)
@avasenin Yes, it is indeed a private repo, I'm afraid.
RawText stores content in memory. We extracted some pieces from this content when analysing diffs. This is the main reason for OOMs. The good solution is to rid off jgit and switch to regular git. As short term solution I propose to exclude big files in diffs
@avasenin I think we just need to check file size before doing the comparison. Regular git doesn’t do diffs on big files anyway
Just out of curiosity - what do you mean regular git doesn't do diffs on big files (I haven't seen this before)? What constitutes a 'big' file in this case?
(Btw, nice work on this app - I really like it!)
Sorry, I’ve probably mistaked github limits with git limits. I’ve seen big files diffs supressed in couple of tools, including github, sourcetree, but looks like git itself does not impose this limitation.
On Mon, Mar 2, 2015 at 11:06 PM Sahil Bajaj notifications@github.com wrote:
Just out of curiosity - what do you mean regular git doesn't do diff on big files (I haven't seen this before)? What constitutes a 'big' file in this case?
(Btw, nice work on this app - I really like it!)
— Reply to this email directly or view it on GitHub https://github.com/someteam/acha/issues/17#issuecomment-76752445.
Ah okay, that makes sense. Thanks!
Using app from the scratch on a private repo:
bash-3.2$ curl -L -O https://github.com/someteam/acha/releases/download/0.2.4/acha-uber-0.2.4.jar
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 399 0 399 0 0 254 0 --:--:-- 0:00:01 --:--:-- 254
100 29.9M 100 29.9M 0 0 2191k 0 0:00:14 0:00:14 --:--:-- 2903k
bash-3.2$ java -jar acha-uber-0.2.4.jar
15:45:26.213 [ INFO ] main : a.server - Working dir .acha
15:45:26.216 [ INFO ] main : acha.db - Initialize db
15:45:26.216 [ INFO ] main : acha.db - Creating DB .acha/db.sqlite
14-Mar-2015 15:45:26 com.mchange.v2.log.MLog <clinit>
INFO: MLog clients using java 1.4+ standard logging.
14-Mar-2015 15:45:27 com.mchange.v2.c3p0.C3P0Registry banner
INFO: Initializing c3p0-0.9.2.1 [built 20-March-2013 10:47:27 +0000; debug? true; trace: 10]
14-Mar-2015 15:45:27 com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource getPoolManager
INFO: Initializing c3p0 pool... com.mchange.v2.c3p0.ComboPooledDataSource [ acquireIncrement -> 3, acquireRetryAttempts -> 30, acquireRetryDelay -> 1000, autoCommitOnClose -> false, automaticTestTable -> null, breakAfterAcquireFailure -> false, checkoutTimeout -> 0, connectionCustomizerClassName -> null, connectionTesterClassName -> com.mchange.v2.c3p0.impl.DefaultConnectionTester, dataSourceName -> 1hge0w3986ibvdx1qj3yxe|4990d0d2, debugUnreturnedConnectionStackTraces -> false, description -> null, driverClass -> org.sqlite.JDBC, factoryClassLocation -> null, forceIgnoreUnresolvedTransactions -> false, identityToken -> 1hge0w3986ibvdx1qj3yxe|4990d0d2, idleConnectionTestPeriod -> 0, initialPoolSize -> 1, jdbcUrl -> jdbc:sqlite:.acha/db.sqlite, maxAdministrativeTaskTime -> 0, maxConnectionAge -> 0, maxIdleTime -> 10800, maxIdleTimeExcessConnections -> 1800, maxPoolSize -> 1, maxStatements -> 0, maxStatementsPerConnection -> 0, minPoolSize -> 1, numHelperThreads -> 3, preferredTestQuery -> null, properties -> {}, propertyCycle -> 0, statementCacheNumDeferredCloseThreads -> 0, testConnectionOnCheckin -> false, testConnectionOnCheckout -> false, unreturnedConnectionTimeout -> 0, userOverrides -> {}, usesTraditionalReflectiveProxies -> false ]
14-Mar-2015 15:45:28 com.mchange.v2.c3p0.impl.AbstractPoolBackedDataSource getPoolManager
INFO: Initializing c3p0 pool... com.mchange.v2.c3p0.ComboPooledDataSource [ acquireIncrement -> 3, acquireRetryAttempts -> 30, acquireRetryDelay -> 1000, autoCommitOnClose -> false, automaticTestTable -> null, breakAfterAcquireFailure -> false, checkoutTimeout -> 0, connectionCustomizerClassName -> null, connectionTesterClassName -> com.mchange.v2.c3p0.impl.DefaultConnectionTester, dataSourceName -> 1hge0w3986ibvdx1qj3yxe|4990d0d2, debugUnreturnedConnectionStackTraces -> false, description -> null, driverClass -> org.sqlite.JDBC, factoryClassLocation -> null, forceIgnoreUnresolvedTransactions -> false, identityToken -> 1hge0w3986ibvdx1qj3yxe|4990d0d2, idleConnectionTestPeriod -> 0, initialPoolSize -> 1, jdbcUrl -> jdbc:sqlite:.acha/db.sqlite, maxAdministrativeTaskTime -> 0, maxConnectionAge -> 0, maxIdleTime -> 10800, maxIdleTimeExcessConnections -> 1800, maxPoolSize -> 1, maxStatements -> 0, maxStatementsPerConnection -> 0, minPoolSize -> 1, numHelperThreads -> 3, preferredTestQuery -> null, properties -> {}, propertyCycle -> 0, statementCacheNumDeferredCloseThreads -> 0, testConnectionOnCheckin -> false, testConnectionOnCheckout -> false, unreturnedConnectionTimeout -> 0, userOverrides -> {}, usesTraditionalReflectiveProxies -> false ]
15:45:28.064 [ INFO ] worker#0 : a.dispatcher - Worker is ready
15:45:28.065 [ INFO ] worker#1 : a.dispatcher - Worker is ready
15:45:28.065 [ INFO ] worker#2 : a.dispatcher - Worker is ready
15:45:28.065 [ INFO ] worker#3 : a.dispatcher - Worker is ready
15:45:28.124 [ INFO ] main : a.server - Server ready at 0.0.0.0:8080
15:49:58.048 [ INFO ] worker-2 : a.server - Added repo: {:timestamp 0, :reason nil, :state waiting, :url https://<username>:<password>@bitbucket.org/<path>.git, :id 2000001}
15:49:58.292 [ INFO ] worker#2 : a.dispatcher - Worker has started processing {:timestamp 0, :reason nil, :state waiting, :url https://<username>:<password>@bitbucket.org/<path>.git, :id 2000001}
15:49:58.294 [ INFO ] worker#2 : a.dispatcher - Fetching/cloning repo https://<username>:<password>@bitbucket.org/<path>.git
15:50:43.990 [ INFO ] worker#2 : a.dispatcher - Scanning new commits for identity achievements https://<username>:<password>@bitbucket.org/<path>.git
Exception in thread "worker#3" java.lang.OutOfMemoryError: Java heap space
at clojure.lang.PersistentHashMap$BitmapIndexedNode.assoc(PersistentHashMap.java:630)
at clojure.lang.PersistentHashMap.assoc(PersistentHashMap.java:142)
at clojure.lang.PersistentHashSet.cons(PersistentHashSet.java:99)
at clojure.lang.PersistentHashSet.cons(PersistentHashSet.java:17)
at clojure.lang.RT.conj(RT.java:608)
at clojure.core$conj__4079.invoke(core.clj:85)
at clojure.core$distinct_QMARK_.doInvoke(core.clj:5407)
at clojure.lang.RestFn.applyTo(RestFn.java:142)
at clojure.core$apply.invoke(core.clj:626)
at clojure.java.jdbc$make_cols_unique.invoke(jdbc.clj:306)
at clojure.java.jdbc$result_set_seq.doInvoke(jdbc.clj:372)
at clojure.lang.RestFn.invoke(RestFn.java:486)
at clojure.java.jdbc$query$fn__7400.invoke(jdbc.clj:835)
at clojure.java.jdbc$db_query_with_resultset$run_query_with_params__7393.invoke(jdbc.clj:787)
at clojure.java.jdbc$db_query_with_resultset.invoke(jdbc.clj:796)
at clojure.java.jdbc$query.doInvoke(jdbc.clj:828)
at clojure.lang.RestFn.invoke(RestFn.java:425)
at acha.db$get_next_repo.invoke(db.clj:135)
at acha.db$get_next_repo_to_process.invoke(db.clj:142)
at acha.dispatcher$worker$fn__7659.invoke(dispatcher.clj:116)
at acha.dispatcher$worker.invoke(dispatcher.clj:115)
at acha.dispatcher$run_workers$fn__7677.invoke(dispatcher.clj:132)
at clojure.lang.AFn.run(AFn.java:22)
at java.lang.Thread.run(Thread.java:695)
Exception in thread "worker#2" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.<init>(String.java:215)
at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542)
at java.nio.CharBuffer.toString(CharBuffer.java:1157)
at org.eclipse.jgit.util.RawParseUtils.decode(RawParseUtils.java:1009)
at org.eclipse.jgit.util.RawParseUtils.decodeNoFallback(RawParseUtils.java:950)
at org.eclipse.jgit.util.RawParseUtils.decode(RawParseUtils.java:909)
at org.eclipse.jgit.util.RawParseUtils.decode(RawParseUtils.java:868)
at org.eclipse.jgit.diff.RawText.decode(RawText.java:207)
at org.eclipse.jgit.diff.RawText.getString(RawText.java:190)
at org.eclipse.jgit.diff.RawText.getString(RawText.java:166)
at acha.git_parser$parse_edit_list$fn__6496$fn__6499.invoke(git_parser.clj:97)
at clojure.core$mapv$fn__6657.invoke(core.clj:6558)
at clojure.lang.ArrayChunk.reduce(ArrayChunk.java:63)
at clojure.core.protocols$fn__6433.invoke(protocols.clj:103)
at clojure.core.protocols$fn__6395$G__6390__6404.invoke(protocols.clj:19)
at clojure.core.protocols$seq_reduce.invoke(protocols.clj:31)
at clojure.core.protocols$fn__6416.invoke(protocols.clj:59)
at clojure.core.protocols$fn__6369$G__6364__6382.invoke(protocols.clj:13)
at clojure.core$reduce.invoke(core.clj:6461)
at clojure.core$mapv.invoke(core.clj:6558)
at acha.git_parser$parse_edit_list$fn__6496.invoke(git_parser.clj:97)
at clojure.core$mapv$fn__6657.invoke(core.clj:6558)
at clojure.core.protocols$fn__6412.invoke(protocols.clj:84)
at clojure.core.protocols$fn__6369$G__6364__6382.invoke(protocols.clj:13)
at clojure.core$reduce.invoke(core.clj:6461)
at clojure.core$mapv.invoke(core.clj:6558)
at acha.git_parser$parse_edit_list.invoke(git_parser.clj:95)
at acha.git_parser$parse_diff_changes.invoke(git_parser.clj:154)
at acha.git_parser$parse_diff_entry.invoke(git_parser.clj:159)
at clojure.core$partial$fn__4490.invoke(core.clj:2489)
at clojure.core$mapv$fn__6657.invoke(core.clj:6558)
I'm working on it right now. I'm reorganising diff-based and loc-based achievements. It will significantly reduce memory consumption during scanning phase.
@alekseysotnikov please check on current master. I've pushed a code with memory optimisation for diff achievements. it should help with oom exceptions on parsing stage (I've checked with significantly reduced heap space).
@avasenin Unfortunately, I couldn't clone private repo by using URL with built-in credentials. Looks like that feature has been eliminated:
bash-3.2$ curl -L -O https://github.com/someteam/acha/releases/download/0.2.5/acha-uber-0.2.5.jar
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 397 0 397 0 0 264 0 --:--:-- 0:00:01 --:--:-- 264
100 33.1M 100 33.1M 0 0 323k 0 0:01:45 0:01:45 --:--:-- 346k
bash-3.2$ java -jar acha-uber-0.2.5.jar
00:34:00.140 [ INFO ] main : a.server - Working dir .acha
00:34:00.143 [ INFO ] main : acha.db - Initialize db
00:34:00.143 [ INFO ] main : acha.db - Creating DB .acha/db.sqlite
00:34:01.114 [ INFO ] worker#1 : a.dispatcher - Worker is ready
00:34:01.114 [ INFO ] worker#0 : a.dispatcher - Worker is ready
00:34:01.114 [ INFO ] worker#2 : a.dispatcher - Worker is ready
00:34:01.114 [ INFO ] worker#3 : a.dispatcher - Worker is ready
00:34:01.142 [ INFO ] main : a.server - Server ready at 0.0.0.0:8080
00:37:53.803 [ INFO ] worker#0 : a.dispatcher - Worker has started processing {:timestamp 0, :snapshot nil, :reason nil, :state waiting, :url https://<username>:<password>@bitbucket.org/<path>.git, :id 2000001}
00:37:53.803 [ INFO ] worker-3 : a.server - Added repo: {:timestamp 0, :snapshot nil, :reason nil, :state waiting, :url https://<username>:<password>@bitbucket.org/<path>.git, :id 2000001}
00:37:53.808 [ INFO ] worker#0 : a.dispatcher - Fetching/cloning repo https://<username>:<password>@bitbucket.org/<path>.git
00:37:54.703 [ ERROR ] worker#0 : a.dispatcher - Repo analysis failed
org.eclipse.jgit.api.errors.TransportException: https://<username>@bitbucket.org/<path>.git: Authentication is required but no CredentialsProvider has been registered
at org.eclipse.jgit.api.FetchCommand.call(FetchCommand.java:139) ~[acha-uber-0.2.5.jar:na]
at org.eclipse.jgit.api.CloneCommand.fetch(CloneCommand.java:178) ~[acha-uber-0.2.5.jar:na]
at org.eclipse.jgit.api.CloneCommand.call(CloneCommand.java:125) ~[acha-uber-0.2.5.jar:na]
at acha.git_parser$clone.invoke(git_parser.clj:43) ~[acha-uber-0.2.5.jar:na]
at acha.git_parser$load_repo.invoke(git_parser.clj:56) ~[acha-uber-0.2.5.jar:na]
at acha.dispatcher$analyze.invoke(dispatcher.clj:115) ~[acha-uber-0.2.5.jar:na]
at acha.dispatcher$worker$fn__7644$fn__7645.invoke(dispatcher.clj:130) ~[acha-uber-0.2.5.jar:na]
at acha.dispatcher$worker$fn__7644.invoke(dispatcher.clj:128) [acha-uber-0.2.5.jar:na]
at acha.dispatcher$worker.invoke(dispatcher.clj:126) [acha-uber-0.2.5.jar:na]
at acha.dispatcher$run_workers$fn__7662.invoke(dispatcher.clj:143) [acha-uber-0.2.5.jar:na]
at clojure.lang.AFn.run(AFn.java:22) [acha-uber-0.2.5.jar:na]
at java.lang.Thread.run(Thread.java:695) [na:1.6.0_65]
Caused by: org.eclipse.jgit.errors.TransportException: https://<username>@bitbucket.org/<path>.git: Authentication is required but no CredentialsProvider has been registered
at org.eclipse.jgit.transport.TransportHttp.connect(TransportHttp.java:498) ~[acha-uber-0.2.5.jar:na]
at org.eclipse.jgit.transport.TransportHttp.openFetch(TransportHttp.java:309) ~[acha-uber-0.2.5.jar:na]
at org.eclipse.jgit.transport.FetchProcess.executeImp(FetchProcess.java:136) ~[acha-uber-0.2.5.jar:na]
at org.eclipse.jgit.transport.FetchProcess.execute(FetchProcess.java:122) ~[acha-uber-0.2.5.jar:na]
at org.eclipse.jgit.transport.Transport.fetch(Transport.java:1115) ~[acha-uber-0.2.5.jar:na]
at org.eclipse.jgit.api.FetchCommand.call(FetchCommand.java:130) ~[acha-uber-0.2.5.jar:na]
... 11 common frames omitted
We assume that http urls are used for public repos. For private repos you could use SSH authentication. i.g. git@bitbucket.org:accountname/reponame.git. By default it uses you default private key (located in ~/.ssh/)
Furthermore, you could use custom private key (see --private-key option). Unfortunately jgit doesn't know how to work with ssh-agent and we don't support pass phrased private keys
@avasenin Unfortunately OOM error again on a same repo:
01:12:34.460 [ INFO ] worker-3 : a.server - Added repo: {:timestamp 0, :snapshot nil, :reason nil, :state waiting, :url git@bitbucket.org:accountname/reponame.git, :id 2000005}
01:12:34.535 [ INFO ] worker#2 : a.dispatcher - Worker has started processing {:timestamp 0, :snapshot nil, :reason nil, :state waiting, :url git@bitbucket.org:accountname/reponame.git, :id 2000005}
01:12:34.542 [ INFO ] worker#2 : a.dispatcher - Fetching/cloning repo git@bitbucket.org:accountname/reponame.git
01:42:02.205 [ INFO ] worker#2 : a.dispatcher - Scanning new commits for identity achievements git@bitbucket.org:accountname/reponame.git
Exception in thread "worker#2" java.lang.OutOfMemoryError: Java heap space
at java.lang.Long.valueOf(Long.java:557)
at clojure.lang.Numbers.num(Numbers.java:1738)
at clojure.lang.Numbers$LongOps.add(Numbers.java:455)
at clojure.lang.Numbers.add(Numbers.java:128)
at clojure.core$range$fn__4622.invoke(core.clj:2900)
at clojure.lang.LazySeq.sval(LazySeq.java:40)
at clojure.lang.LazySeq.seq(LazySeq.java:49)
at clojure.lang.ChunkedCons.chunkedNext(ChunkedCons.java:59)
at clojure.core$chunk_next.invoke(core.clj:669)
at clojure.core.protocols$fn__6465.invoke(protocols.clj:106)
at clojure.core.protocols$fn__6427$G__6422__6436.invoke(protocols.clj:19)
at clojure.core.protocols$seq_reduce.invoke(protocols.clj:31)
at clojure.core.protocols$fn__6448.invoke(protocols.clj:59)
at clojure.core.protocols$fn__6401$G__6396__6414.invoke(protocols.clj:13)
at clojure.core$reduce.invoke(core.clj:6514)
at clojure.core$mapv.invoke(core.clj:6611)
at acha.git_parser$parse_edit_list$fn__6322.invoke(git_parser.clj:104)
at clojure.core$mapv$fn__6689.invoke(core.clj:6611)
at clojure.core.protocols$fn__6444.invoke(protocols.clj:84)
at clojure.core.protocols$fn__6401$G__6396__6414.invoke(protocols.clj:13)
at clojure.core$reduce.invoke(core.clj:6514)
at clojure.core$mapv.invoke(core.clj:6611)
at acha.git_parser$parse_edit_list.invoke(git_parser.clj:102)
at acha.git_parser$parse_diff_changes.invoke(git_parser.clj:154)
at acha.git_parser$parse_diff.invoke(git_parser.clj:161)
at acha.dispatcher$find_diff_achievements$fn__7585.invoke(dispatcher.clj:39)
at clojure.lang.PersistentVector.reduce(PersistentVector.java:332)
at clojure.core$reduce.invoke(core.clj:6513)
at acha.dispatcher$find_diff_achievements.invoke(dispatcher.clj:46)
at acha.dispatcher$analyze_commit.invoke(dispatcher.clj:52)
at acha.dispatcher$find_commit_achievements$fn__7599.invoke(dispatcher.clj:66)
at clojure.core$map$fn__4525$fn__4526.invoke(core.clj:2601)
Works fine by using java -Xmx2048m -jar acha-uber-0.2.5.jar
. Thanks!
When I tried to import a large-ish repo, I keep getting
OutofMemoryError
s. The first time the message saidjava.lang.OutOfMemoryError: Java heap space
so I ran the jar again usingjava -Xmx2048m -jar acha-uber-0.2.4.jar
and this time the message saidjava.lang.OutOfMemoryError: GC overhead limit exceeded
.This always fails at the repository
scanning
stage so I don't get to see anything. Is there any way to fix this? I don't think the repo I'm trying to import is that large -- only 1800+ commits on the master branch (3000+ overall).