Open MrCreosote opened 2 years ago
The unserialized memory hit could be mostly avoided by pulling the provenance data as BSON (assuming that's possible) and then serially converting to an in memory object, making any necessary changes, serializing to JSON, and embedding in a JsonTokenStream and UObject.
You can theoretically get raw BSON like this in MongoWorkspaceDB
:
private Map<ObjectId, Provenance> getProvenance(
final Map<ResolvedObjectID, Map<String, Object>> vers)
throws WorkspaceCommunicationException {
final Map<ObjectId, Map<String, Object>> provIDs = new HashMap<>();
for (final ResolvedObjectID id: vers.keySet()) {
provIDs.put((ObjectId) vers.get(id).get(Fields.VER_PROV), vers.get(id));
}
final Map<ObjectId, Provenance> ret = new HashMap<>();
final Document query = new Document(Fields.MONGO_ID,
new Document("$in", provIDs.keySet()));
try {
// TODO MEM does this reduce memory usage if we store the provenance as a string?
// should only be deserializing BSON one object at a time vs. all of them
final MongoCollection<RawBsonDocument> col = wsmongo.getCollection(
COL_PROVENANCE, RawBsonDocument.class);
for (final RawBsonDocument rbd: col.find(query)) {
// final BsonDocument bdoc = rbd.toBsonDocument(BsonDocument.class, null);
final Document dbo = wsmongo.getCodecRegistry().get(Document.class)
.decode(rbd.asBsonReader(), DecoderContext.builder().build());
final ObjectId oid = dbo.getObjectId(Fields.MONGO_ID);
// rest of the method is the same
To return JTS wrapped JSON strings rather than provenance objects we'd have to ignore the SDK compiled return classes and change the return type in WorkspaceServer
to the new class type or Object
(which is what JSONServerServlet
expects anyway). Every time the server was recompiled the return types would be overwritten, and so that'd have to be fixed on recompile.
get_objects2
can return up to 10K objects, each with their own provenance, and provenance can be up to 1MB serialized. That's 10GB serialized, or 5-20x that unserialized.Save the size of the provenance in the provenance mongo doc. Before pulling the provenance check the total size and throw an error if it's over some reasonable amount (100MB?)
This is pretty unlikely to ever cause a problem - most provenance is a few KB.