Should we be using MongoDB at the end of our pipeline? @tcrick has some ideas about using it to store pipeline runs. I can see the value of this for sure, and while using parquet is awesome, the whole "experiment" / analysis setup I've got in place right now with the curry package is probably going to be inefficient when scaled up to the full reddit dataset. @tcrick, can you discuss this with Romina when I'm on leave? We can chat about whether, and if so where, to use MongoDB.
Should we be using MongoDB at the end of our pipeline? @tcrick has some ideas about using it to store pipeline runs. I can see the value of this for sure, and while using
parquet
is awesome, the whole "experiment" / analysis setup I've got in place right now with thecurry
package is probably going to be inefficient when scaled up to the full reddit dataset. @tcrick, can you discuss this with Romina when I'm on leave? We can chat about whether, and if so where, to use MongoDB.