Closed 0x80 closed 3 years ago
@schmidt-sebastian anything jumping out at you as weird here? I don't see anything client side which should necessarily being eating more memory as the upstream database grows?
I know very little about BigQuery's network stack, but it looks like they only use the low-level Firestore API which was never as prone to memory leaks as the main SDK. I would expect that the leak is somewhere in BigQuery's network stack, but I don't have any insights.
Since I created the issue I have tried to run the insert jobs sequentially (waiting for the insert, not waiting the created job to finish) instead of parallel. It made no difference. I think the problem is not a memory leak, but the fact the the client somehow reads / validates all of the data being passed into the job.
This seems inefficient to me so I would still consider this a bug. Importing managed exports into big query by pointing jobs to a bucket should not put this amount of load on the client inserting the jobs IMO. But I think it's possible that this is by design and not considered a bug.
Greetings @0x80! By chance, have you done a heap snapshot analysis? It would be good to know where the objects in your program are coming from. Without seeing the snapshot, it's impossible to know what's this module and what's other factors in your app.
One thing specifically that jumps out to me is this:
const collectionIds = await getExportCollectionList();
Any ideas on how big that list could get?
I had the same issue and realized that this only happens if I use ES6 import statements and doesn't happen if I use require
. Is there a cyclic dependency somewhere that causes this OOM issue?
@JustinBeckwith No, I haven't done an analysis like that. The collections list is never that big. We have about 20 root collections in total so it won't go over that.
@JustinBeckwith @bcoe do you think that a cyclic dependency might be the cause of this, as mentioned by @arorajatin?
@JustinBeckwith @bcoe do you think that a cyclic dependency might be the cause of this, as mentioned by @arorajatin?
I'm not quite sure what magic makes this work:
import {
GoogleAuth,
OAuth2Client
} from "google-auth-library";
☝️ I don't believe import statements were introduced until Node 12. @0x80 are you running a build step, and writing your application using babel or TypeScript?
Another thing worth digging into
How large of a set is returned by getExportCollectionList()
, if this has grown into millions of entries, this seems like the most likely place for a memory leak.
@bcoe My scripts are written in Typescript and executed by ts-node.
The getExportCollectionList
function just returns a list of collection names that need to be exported based on blacklisting. The returned value is a list of strings in size of around 20, so that can't be it.
Here's the implementation:
export function getExportCollectionList() {
const collectionsToExclude = ["__system", "emails"];
return db.listCollections().then((collectionRefs) => {
return collectionRefs
.map((ref) => ref.id)
.filter((id) => !collectionsToExclude.includes(id));
});
}
@0x80 just to eliminate something from the equation, could you use tsc
and compile the script to JavaScript?
Closing due to lack of activity but feel free to reopen if the above steps don't help with more repro information. Thank you!
Facing Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
error, when googleapis is instantiated in code
In the code below I inject some jobs to import data in BiqQuery from a Firestore export file. Since a few days my database has reached a point where this is creating a heap out of memory error in my deployed cloud function. 2GB of memory is not enough anymore to execute this code.
I assume there is a memory leak somewhere. I can not think of a reason why injecting a job pointing to a file in a storage bucket would have to use so much memory on the client-side.
I guess I could enhance the code by waiting for each job to finish before injecting a new one, but that would be silly because it would likely cause the cloud function to time out and clearly doesn't scale.
Environment details
googleapis
version: 47.0.0Steps to reproduce