Metatranscriptome samples fail processing

ssarrafan commented 3 months ago

Workflow Name metatranscriptome

Project URL

Additional Info I recently started using your metatranscriptomics pipeline to process my paired-end samples, and it has been working great for most of my samples. However, three of my samples failed to be completely processed. My initial thoughts were that they were too big (~3 GB), but they do not exceed the NMDC file limit of 10 GB. I would appreciate your help troubleshooting. I attached a file with the log output.

It was nice meeting you today. Thank you so much for all your help. My ORCID ID is 0000-0003-2065-3300 and the sample file names that I need to process in the metaT pipeline are these:

60_S9_L002_R1_001.fastq.gz 60_S9_L002_R2_001.fastq.gz

69_S14_L002_R1_001.fastq.gz 69_S14_L002_R2_001.fastq.gz

72_S15_L002_R1_001.fastq.gz 72_S15_L002_R2_001.fastq.gz

ssarrafan commented 3 months ago

We met with Alma last week and Mark Flynn said he would re-run the samples for her over the weekend after they put in the fix.

ssarrafan commented 3 months ago

This morning Alma is reporting that the processing for the 3 samples is still failing. Mark F is looking into this.

kaijli commented 3 months ago

Updates: memory parameter for megahit task bumped / variable added, allowed for retries through cromwell to bump memory in future to prevent manual increases, successful run on one sample, working on website output error.

ssarrafan commented 3 months ago

2 of the 3 samples ran successfully @mflynn-lanl is following up on why the 3rd sample failed

mflynn-lanl commented 3 months ago

We are still trying to figure out why the workflow failed. There are no errors reported other than in the workflow metadata which has: Job f_annotate.smart:NA:1 exited with return code -1 which has not been declared as a valid return code Not very informative! I'll check with the NMDC folks to see if someone there has any ideas

From: ssarrafan @.> Sent: Thursday, August 1, 2024 11:21 AM To: microbiomedata/nmdc-edge @.> Cc: Flynn, Mark C @.>; Mention @.> Subject: [EXTERNAL] Re: [microbiomedata/nmdc-edge] Metatranscriptome samples fail processing (Issue #268)

2 of the 3 samples ran successfully @mflynn-lanlhttps://urldefense.com/v3/__https://github.com/mflynn-lanl__;!!Bt8fGhp8LhKGRg!EZXksuKKjyYoTUnI6CbvmHYC8mBSr2HVicfWtL0A9pd8OE8J1GlSzs8Yke7wHHHqk-TYMgoR0eWoCKcZe1bamg9l$ is following up on why the 3rd sample failed

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/microbiomedata/nmdc-edge/issues/268*issuecomment-2263575701__;Iw!!Bt8fGhp8LhKGRg!EZXksuKKjyYoTUnI6CbvmHYC8mBSr2HVicfWtL0A9pd8OE8J1GlSzs8Yke7wHHHqk-TYMgoR0eWoCKcZe1PnVpmg$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AEXIUF5HZZMZI6ILXSQVB3TZPJVAPAVCNFSM6AAAAABLU455U6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENRTGU3TKNZQGE__;!!Bt8fGhp8LhKGRg!EZXksuKKjyYoTUnI6CbvmHYC8mBSr2HVicfWtL0A9pd8OE8J1GlSzs8Yke7wHHHqk-TYMgoR0eWoCKcZexrpB8xC$. You are receiving this because you were mentioned.Message ID: @.***>

kaijli commented 3 months ago

Update: rerunning the sample after requesting more resources, noticed a missing shard in the above failed run, so hoping it was just a blip and that a rerun will either resolve or give more clues.

kaijli commented 3 months ago

Update: still running on expanse

ssarrafan commented 3 months ago

@mflynn-lanl @kaijli what's the status of this?

ssarrafan commented 3 months ago

Since this is the last issue for the old sprint I'll move this over to the new sprint to hopefully be able to close. FYI @mflynn-lanl @kaijli

ssarrafan commented 3 months ago

Per @kaijli the last run just completed and @mflynn-lanl is sending it over to Alma.
@mflynn-lanl @kaijli if you can add some notes to how you resolved the issue here and close this that would be great.

kaijli commented 3 months ago

Issue: Large file size caused assembly / megahit to fail, requested more resources and reran all three samples. Third sample failed due to a missing shard in annotation, reran to see if it's a random system error or a code error. With the rerun, an sbatch error occurred due to "unexpected message received", but no indicators of issues were present. Manually submitted the sbatch job to continue the workflow. Workflow completed with long run time.

Take home lessons (at least on my part): File size limits on user submission is an estimate of how much memory will be used, and some memory uses can balloon a lot depending on the type of sample, not sure how to have more strategic calculations on size estimates.

Early on, I was able to run this sample on the new metaT without issues so hopefully these issues will no longer be relevant with the update.

ssarrafan commented 3 months ago

Thanks for your help with this issue. I confirmed with Alma that she has what she needs. I'm closing this issue.

microbiomedata / nmdc-edge

Metatranscriptome samples fail processing #268