microbiomedata / nmdc_automation

Prototype automation
2 stars 2 forks source link

update update_omics_output_data_object function to update was_generated_by #158

Closed aclum closed 3 months ago

aclum commented 4 months ago

was_generated_by for data_object_set documents that are has_output of "Organic Matter Characterization" still have a legacy omics ID Function needs to be updated to populate was_generated_by with the napa omics_processing_set id

I found these with

db.getCollection('data_object_set').aggregate(

[ { $match: { was_generated_by: { $exists: true } } }, { $match: { was_generated_by: { $not: { $regex: RegExp('nmdc:') } } } }, { $lookup: { from: 'omics_processing_set', localField: 'id', foreignField: 'has_output', as: 'omics_processing_set' } }, { $group: { _id: '$omics_processing_set.omics_type.has_raw_value', count: { $sum: 1 } } } ], { maxTimeMS: 60000, allowDiskUse: true } );

aclum commented 4 months ago

All of these records are from nmdc:sty-11-33fbta56 so that is the only project that you'd have the rerun update-study