Closed GoogleCodeExporter closed 9 years ago
Also consider upgrading to Axis 2.
Currently the HTTPCLient library v3 has been modified for NTLM v2 and Kerberos
authentication schemes.
Original comment by rakeshs101981@gmail.com
on 29 Jan 2010 at 4:22
Original comment by rakeshs101981@gmail.com
on 29 Jan 2010 at 4:22
The correct issue is Connector Manager Issue 212
Original comment by rakeshs101981@gmail.com
on 29 Jan 2010 at 4:24
As described here: http://hc.apache.org/httpclient-3.x/performance.html
"HttpClient is capable of efficient request/response body streaming. Large
entities
may be submitted or received without being buffered in memory. This is
especially
critical if multiple HTTP methods may be executed concurrently. While there are
convenience methods to deal with entities such as strings or byte arrays, their
use
is discouraged. Unless used carefully they can easily lead to out of memory
conditions, since they imply buffering of the complete entity in memory."
Hence the issue seems to be more related to the environment, where the
connection is
closed pre-maturely for various reasons known to HTTClient library. The problem
is
not repeated for content crawled from all SharePoint servers, but only onw
SharePoint server, which supposedly seems to be serving slow connections
Original comment by rakeshs101981@gmail.com
on 11 Feb 2010 at 8:36
Works fine with another SharePoint installation with more than 160k docs.
One observation is that this exception occurs whenever
the "java.net.SocketException: No buffer space available (maximum connections
reached?): JVM_Bind" occurs as reported in Issue 59
So the exception might be triggered by following sequence:
1. The SharePoint serving is slow in serving content
2. The connector soon exhausts the max sockets available for establishing
connection
resulting in JVM_Bind exception
3. This might trigger HTTPCLient library to close existing connections
4. The read of any such closed connection will throe an IOException
Original comment by rakeshs101981@gmail.com
on 11 Feb 2010 at 8:58
I am trying to understand how the Connector is running out of sockets.
Are you maintaining open InputStreams for all items in the returned
DocumentList (2000 open InputStreams)?
Are you opening a new InputStream for a Document's content upon a call to
nextDocument (or when fetching
the content Property for that Document)?
Does it seem that the Document's content InputStream.close() method doesn't
ever get called (except by
AutoClose)?
Are many Traversal batches timing out, getting cancelled, then leaking an open
connection to the Sharepoint
server?
Are you running a large number (> 10) of concurrent Connector instances in the
same Connector Manager?
Is it possible that it is the Sharepoint Server that has run out of sockets
(rather than the Connector client)?
Original comment by Brett.Mi...@gmail.com
on 11 Feb 2010 at 7:09
The conenctor just hands over an inputstream to the CM. Its opened only when CM
calls findProperty("google:content") or findProperty("google:mimetype").
The exception occurs because AutoCloseInputStream.close() was called. This can
happen when the HTTPConnection itself will be closed.
No batches are timing out or getting cancelled.
The one consistent pattern that I have been able to ascertain for logs from
every
run is, JVM_BIND is usually present. The error is reproduced with only one
connector
instance. The same connector works fine if configured for some other SP server.
The problem does not occur for all documents, but few documents and after
sometime
documents are fed succesfully
The above hypothesis is based on all these observations
Original comment by rakeshs101981@gmail.com
on 12 Feb 2010 at 1:43
Original comment by rakeshs101981@gmail.com
on 20 May 2010 at 6:20
Original comment by rakeshs101981@gmail.com
on 6 Oct 2010 at 7:50
This issue is filed as Google issue #6513766
Original comment by tdnguyen@google.com
on 18 May 2012 at 12:12
Original issue reported on code.google.com by
rakeshs101981@gmail.com
on 29 Jan 2010 at 4:19