kbase / workspace_deluxe

The Workspace Service (WSS) is primarily a language independent remote storage and retrieval system for KBase typed objects (TO) defined with the KBase Interface Description Language (KIDL).
MIT License
1 stars 17 forks source link

Provenance could potentially OOM the workspace #576

Open MrCreosote opened 2 years ago

MrCreosote commented 2 years ago

get_objects2 can return up to 10K objects, each with their own provenance, and provenance can be up to 1MB serialized. That's 10GB serialized, or 5-20x that unserialized.

Save the size of the provenance in the provenance mongo doc. Before pulling the provenance check the total size and throw an error if it's over some reasonable amount (100MB?)

This is pretty unlikely to ever cause a problem - most provenance is a few KB.

MrCreosote commented 2 years ago

The unserialized memory hit could be mostly avoided by pulling the provenance data as BSON (assuming that's possible) and then serially converting to an in memory object, making any necessary changes, serializing to JSON, and embedding in a JsonTokenStream and UObject.

MrCreosote commented 2 years ago

You can theoretically get raw BSON like this in MongoWorkspaceDB:

private Map<ObjectId, Provenance> getProvenance(
            final Map<ResolvedObjectID, Map<String, Object>> vers)
            throws WorkspaceCommunicationException {
        final Map<ObjectId, Map<String, Object>> provIDs = new HashMap<>();
        for (final ResolvedObjectID id: vers.keySet()) {
            provIDs.put((ObjectId) vers.get(id).get(Fields.VER_PROV), vers.get(id));
        }
        final Map<ObjectId, Provenance> ret = new HashMap<>();
        final Document query = new Document(Fields.MONGO_ID,
                new Document("$in", provIDs.keySet()));
        try {
            // TODO MEM does this reduce memory usage if we store the provenance as a string?
            // should only be deserializing BSON one object at a time vs. all of them
            final MongoCollection<RawBsonDocument> col = wsmongo.getCollection(
                    COL_PROVENANCE, RawBsonDocument.class);
            for (final RawBsonDocument rbd: col.find(query)) {
//              final BsonDocument bdoc = rbd.toBsonDocument(BsonDocument.class, null);
                final Document dbo = wsmongo.getCodecRegistry().get(Document.class)
                        .decode(rbd.asBsonReader(), DecoderContext.builder().build());
                final ObjectId oid = dbo.getObjectId(Fields.MONGO_ID);
// rest of the method is the same

To return JTS wrapped JSON strings rather than provenance objects we'd have to ignore the SDK compiled return classes and change the return type in WorkspaceServer to the new class type or Object (which is what JSONServerServlet expects anyway). Every time the server was recompiled the return types would be overwritten, and so that'd have to be fixed on recompile.