In order to complete the bulk extraction on older datasets, a list of dataset UUIDs that have not yet been processed needs to be created for each extractor. This is done directly from the Mongo database and written into a file.
example mongo query - demosaic extractor
Find all datasets where we can't find any metadata created by the demosaic extractor.
db.datasets.find({"name": /stereoTop.*/}).forEach( function(doc) {
var found = false;
var extractorname = "terra.demosaic";
db.metadata.find({"attachedTo._id": doc._id, "creator.name": extractorname}).forEach( function(subdoc) {
found = true;
});
if (!found) {
print(doc.id);
}
});
extractors with lists to be generated
[x] demosaic
[x] flirIr
[x] plantCV
[x] hyperspectral
[x] ply2las
[x] netCDF
[x] canopy cover
[x] sensorposition (geospatial)
[x] environmentlogger
Once these lists are generated, we will iterate through each list and submit the dataset/file to Clowder for extraction with a call like:
POST http://terraref.ncsa.illinois.edu/clowder/api/datasets/<UUID>/extractions?key=<SECRET>
data='{"extractor":"terra.environmentlogger"}'
I'm generating these into /home/mburnet2/extractor_batch on the terra production VM and moving them locally for now.
In order to complete the bulk extraction on older datasets, a list of dataset UUIDs that have not yet been processed needs to be created for each extractor. This is done directly from the Mongo database and written into a file.
example mongo query - demosaic extractor Find all datasets where we can't find any metadata created by the demosaic extractor.
extractors with lists to be generated
Once these lists are generated, we will iterate through each list and submit the dataset/file to Clowder for extraction with a call like:
I'm generating these into /home/mburnet2/extractor_batch on the terra production VM and moving them locally for now.