Perform the upsert queries as we reach the requisite batch size while reading JSON files, not all at the end (which leads to significant memory consumption for a lot of opinion data)
Download CL tar files to disk rather than holding them in memory so as to consume less memory
Parallelize downloads but not tar extraction (faster downloads but don't kill RAM usage)
A few things we can do