microbiomedata / issues

public repo for issues related to NMDC work
1 stars 0 forks source link

reprocess datasets through MAG workflow #619

Open aclum opened 4 months ago

aclum commented 4 months ago

RACI

Tag people in their roles

Describe the the task

Criteria for completion

Estimate people time

Completion Date (Goal)

Target Sprint Start & End Dates

Tag Blocker/Contingent upon issues

aclum commented 4 months ago

related to https://github.com/microbiomedata/metaMAGs/issues/19

Michal-Babins commented 4 months ago

I am gathering the neccessary information to reprocess these. I will have @scanon review this. Once the mags chances are ready, we can resubmit trip and neon soil for mags only with the bumped version. We can start with TriP first since it's smaller and move to NEON.

Michal-Babins commented 4 months ago

I am running a few more tests to make sure the runtime attributes are set correctly with the updates, the changes @chienchi made look good, but need to confirm that the runtime requirements are stable. I had one project crash on me due to OOM and the requested memory was too low.

aclum commented 3 months ago

We need to add the new file types including capturing the low quality bins, see https://github.com/microbiomedata/nmdc-schema/pull/1791. I'm working on getting runtime dev updated.

ssarrafan commented 3 months ago

@Michal-Babins @aclum what is the status of this issue?

aclum commented 3 months ago

Michal was testing the runtime requirements and we need an updated runtime prod env to be able to handle the new file type numerations. This first needs a new update to runtime dev. We updated runtime dev to 10.1 but that release is incomplete per Patrick C. so we need runtime dev to be updated to 10.1.1, then test, then runtime prod needs to be updated to 10.1.1. cc @kaijli

ssarrafan commented 3 months ago

Michal was testing the runtime requirements and we need an updated runtime prod env to be able to handle the new file type numerations. This first needs a new update to runtime dev. We updated runtime dev to 10.1 but that release is incomplete per Patrick C. so we need runtime dev to be updated to 10.1.1, then test, then runtime prod needs to be updated to 10.1.1. cc @kaijli

Thanks Alicia. Should this be moved to the next sprint?

ssarrafan commented 3 months ago

Removing from sprint. @aclum @kaijli let me know if this is still active.

aclum commented 3 months ago

We need to resolve the blockers first, adding backlog label so we don't lose this.

aclum commented 3 months ago

@aclum to add study IDs.

aclum commented 3 months ago

Provided list of 1970 MAGs activities that need to be reprocessed via slack. Search by version is below, this returned the same number of records as a search by date ({'started_at_time':{ $gte : "2023-07-01"}})

db.getCollection('mags_activity_set').find({
--
version: 'v1.0.6'
});

Studies impacted (TRiP nmdc:sty-11-hdd4bf83, NEON soil nmdc:sty-11-34xj1150, NEON benthic nmdc:sty-11-pzmd0x14)

aclum commented 3 months ago

nmdc.mags_activity_set.csv

aclum commented 3 months ago

Reattaching the list that I provided in slack on July 28th 2023 to Shane and Michal. These are projects where we know the JGI workflow produced bins with a mapping to NMDC's corresponding omics ID.

IMG (15).xlsx

cc @scanon

aclum commented 3 months ago

Moving this to blocked as this is on hold until we get the eukcc component code from Neha.