Failed to complete command task: 'rm_individual_seg_files' launched from master workflow,

yu052 commented 3 years ago

Dear author,

I built the singularity environment from the Docker image recently. I got this errror as mentioned in the title when I run it. The detailed error information is as follows: [2020-11-17T17:07:15.734560] [node127.cm.cluster] [68840_1] [WorkflowRunner] [ERROR] Worklow terminated due to the following task errors: [2020-11-17T17:07:15.735768] [node127.cm.cluster] [68840_1] [WorkflowRunner] [ERROR] Failed to complete command task: 'rm_individual_seg_files' launched from master workflow, error code: 1, command: 'rm' [2020-11-17T17:07:15.736237] [node127.cm.cluster] [68840_1] [WorkflowRunner] [ERROR] [rm_individual_seg_files] Error Message: [2020-11-17T17:07:15.736848] [node127.cm.cluster] [68840_1] [WorkflowRunner] [ERROR] [rm_individual_seg_files] Last 2 stderr lines from task (of 2 total lines): [2020-11-17T17:07:15.736848] [node127.cm.cluster] [68840_1] [WorkflowRunner] [ERROR] [2020-11-17T15:50:23.975587] [node127.cm.cluster] [68840_1] [rm_individual_seg_files] rm: missing operand [2020-11-17T17:07:15.736848] [node127.cm.cluster] [68840_1] [WorkflowRunner] [ERROR] [2020-11-17T15:50:23.975877] [node127.cm.cluster] [68840_1] [rm_individual_seg_files] Try 'rm --help' for more information.

Do you have any clue why this error happened? Can you please help me to solve it?

Additional information: I was implementing the program on a pair of WGS of canine tumour and normal tissue. The required reference files were properly made, except the snp_sites.gz file. But, I removed the option --callRegions of the SNP calling step using the Strelka from the main.py file. So the program can still work without the snp_sites file.

Regards, Yun

polyactis commented 3 years ago

Interesting to hear that someone is trying Accucopy on a dog tumor! Finally! That's why we spent effort to make it possible for users to make their own reference genomes.

It is a never-seen-before error. It looks like Accucopy failed in the middle of deleting intermediate segmentation files (one file per chromosome) because the "rm" command was provided with empty arguments, " rm: missing operand ". These files were being deleted because a prior step should have already combined them into one file.

I am not sure what will happen if the snp_sites.gz is missing. Is there a high-quality SNP-sites file for dogs? I recall, information from low-quality SNPs will cause Accucopy to misbehave. That's why we use this file to exclude them.

Did you modify other parts of our workflow/code? Can you provide us with the log folder (containing pyflow logs, and stdout /stderr of other programs)?

-- Yu Huang Professor, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, CAS http://www.yfish.org/ https://sites.google.com/site/polyactis/

On Wed, Nov 18, 2020 at 3:58 PM yu052 notifications@github.com wrote:

Dear author,

I built the singularity environment from the Docker image recently. I got this errror as mentioned in the title when I run it. The detailed error information is as follows: [2020-11-17T17:07:15.734560] [node127.cm.cluster] [68840_1] [WorkflowRunner] [ERROR] Worklow terminated due to the following task errors: [2020-11-17T17:07:15.735768] [node127.cm.cluster] [68840_1] [WorkflowRunner] [ERROR] Failed to complete command task: 'rm_individual_seg_files' launched from master workflow, error code: 1, command: 'rm' [2020-11-17T17:07:15.736237] [node127.cm.cluster] [68840_1] [WorkflowRunner] [ERROR] [rm_individual_seg_files] Error Message: [2020-11-17T17:07:15.736848] [node127.cm.cluster] [68840_1] [WorkflowRunner] [ERROR] [rm_individual_seg_files] Last 2 stderr lines from task (of 2 total lines): [2020-11-17T17:07:15.736848] [node127.cm.cluster] [68840_1] [WorkflowRunner] [ERROR] [2020-11-17T15:50:23.975587] [node127.cm.cluster] [68840_1] [rm_individual_seg_files] rm: missing operand [2020-11-17T17:07:15.736848] [node127.cm.cluster] [68840_1] [WorkflowRunner] [ERROR] [2020-11-17T15:50:23.975877] [node127.cm.cluster] [68840_1] [rm_individual_seg_files] Try 'rm --help' for more information.

Do you have any clue why this error happened? Can you please help me to solve it?

Additional information: I was implementing the program on a pair of WGS of canine tumour and normal tissue. The required reference files were properly made, except the snp_sites.gz file. But, I removed the option --callRegions of the SNP calling step using the Strelka from the main.py file. So the program can still work without the snp_sites file.

Regards, Yun

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/polyactis/Accucopy/issues/6, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAF7C2MU3EEVH7WPKY2X5Z3SQN5DPANCNFSM4TZTV5WA .

yu052 commented 3 years ago

pyflow_log.zip

Thanks for your response! I attach the pyflow log files.

I have a SNP-sites file for dogs, but I decided to disable the --callRegions because I got a strange error from that file which was that the chr33 in snp_sites.gz was not found in the reference genome. Maybe disabling that option was a bad idea. But do you have any clue why that particular chr33 can not be found in the reference genome? I checked the reference genome file, the chr33 was there I think.

Thanks for your help in advance!

Regards, Yun

polyactis commented 3 years ago

Did you check if chr33 is in your genome index files (i.e. genome.dict and etc.)?

fanxinping commented 3 years ago

We saw this in the pyflow_tasks_stdout_log.txt, which suggests all reads in your bam fail to pass our filters:

Reading in genome coverage from "/home/WUR/yu052/DogWUR108_rh.dedup_st.reA.bam" ...

Reading and smoothing of coverage from "/home/WUR/yu052/DogWUR108_rh.dedup_st.reA.bam" is Done. 0 unique chromosomes, 902274498 reads. Genome wide mean coverage is NaN Reading in genome coverage from "/home/WUR/yu052/DogWUR115_rh.dedup_st.reA.bam" ... Reading and smoothing of coverage from "/home/WUR/yu052/DogWUR115_rh.dedup_st.reA.bam" is Done. 0 unique chromosomes, 174036637 reads. Genome wide mean coverage is NaN

Our rust program contains these filters. Can you check your bam to see why all reads fail to pass these filters?

            if record.mapq()<30 {
                continue
            }
            if record.is_paired() && ( record.insert_size()<0 || record.insert_size()>self.max_fragment_len as i64 ||
                !record.is_proper_pair() || record.is_mate_unmapped() || !record.is_first_in_template() ||
                record.is_secondary() || record.is_duplicate() || record.is_supplementary() ) {
                continue;
            }

yu052 commented 3 years ago

I checked that chromosome 33 is indeed in the genome.fa, genome.dict, and genome.fa.fai. It is weird that all the reads in the bam failed to pass the filters, isn't it? I confirmed that most reads have mapq 60. Do you have any other clue why I got these errors?

KInd regards

polyactis commented 3 years ago

Then it probably failed in these filters. You can check https://samtools.github.io/hts-specs/SAMv1.pdf on how to know if a read is properly paired, if its mate is unmapped, what the insert size (fragment length) is and if it is over 1000 (max length set in your program), if it's duplicated, if it's supplementary, if it's secondary.

Most of the info is in the FLAG column. it's coded in binary bits. Need some knowledge on the conversion between binary and decimal numbers. You may have to ask a computer scientist regarding how to decode the info.

if record.is_paired() && ( record.insert_size()<0 || record.insert_size()>self.max_fragment_len as i64 || !record.is_proper_pair() || record.is_mate_unmapped() || !record.is_first_in_template() || record.is_secondary() || record.is_duplicate() || record.is_supplementary() ) { continue; }

On Tue, Nov 24, 2020 at 11:34 PM yu052 notifications@github.com wrote:

I checked that chromosome 33 is indeed in the genome.fa, genome.dict, and genome.fa.fai. It is weird that all the reads in the bam failed to pass the filters, isn't it? I confirmed that most reads have mapq 60. Do you have any other clue why I got these errors?

KInd regards

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/polyactis/Accucopy/issues/6#issuecomment-733051974, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAF7C2NVOLUW44MPXINLN7TSRPG7DANCNFSM4TZTV5WA .

yu052 commented 3 years ago

Thanks for your response! Actually, I am pretty sure I have good bam files. I checked them in Jbrowse and IGV. Of course, there are mismapped reads, but not much. It makes no sense that all the reads just failed in the filters, right? The Strelka didn't work in your accucopy pipeline at the first step. But I succeed in running the Strelka independently. This is also confusing me.

polyactis commented 3 years ago

You can copy the independent Strelka into the docker and overwrite the docker version and ran it inside the docker to see if anything strange.

You bam files identify chromsomes as "chr1", not "1", right? Our program assumes "chr1", not "1".

yu052 commented 3 years ago

Is it possible to make your program compatible for both format?

polyactis commented 3 years ago

Both formats? Bam and ?

On Sat, Dec 5, 2020, 4:58 PM yu052 notifications@github.com wrote:

Is it possible to make your program compatible for both format?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/polyactis/Accucopy/issues/6#issuecomment-739149363, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAF7C2KIOEUDDGD4Y6IBD6TSTHY35ANCNFSM4TZTV5WA .

yu052 commented 3 years ago

Sorry that I didn't make it clear. I mean the format of the name of the chromosome, chr1 and 1.

polyactis / Accucopy

Failed to complete command task: 'rm_individual_seg_files' launched from master workflow, #6