uclahs-cds / metapipeline-DNA

Nextflow pipeline to convert BAM to FASTQ, align, perform QC, assess targeted coverage, call gSNP, call sSNV, call mtSNV, call SVs, call sCNA, and perform subclonal reconstruction
GNU General Public License v2.0
4 stars 0 forks source link

Cell line specific situation - N and T samples with different patient IDs #114

Open RoniHaas opened 1 year ago

RoniHaas commented 1 year ago

Describe the issue I received an error that its meaning isn't clear to me. This is after an 8.15 hr run. The run was marked as completed.

executor > local (7) [01/d51cc8] process > create_CSV_metapipeline_DNA... [100%] 2 of 2 ✔ [80/5d6b21] process > create_config_metapipeline_DNA [100%] 1 of 1 ✔ [c1/3a6964] process > call_metapipeline_DNA (2) [100%] 2 of 2 ✔ [69/ebd9b6] process > check_process_status (2) [100%] 2 of 2 ✔ Process in /hot/project/disease/ProstateTumor/PRAD-000096-RadioResDU145Molecular/22Rv1/WXS/input/CHPRRR2M_1/c1/3a69648915774d2f6e6d01586063a3 failed with non-zero exit code. It would be great if you could help me understand the source of this error @yashpatel6 .

yashpatel6 commented 1 year ago

Hi Roni, it looks like the issue is with the YAML: /hot/project/disease/ProstateTumor/PRAD-000096-RadioResDU145Molecular/22Rv1/WXS/input/CHPRRR2M_1/metapipe_input_CHPRRR2M_1.yam - there seem to be two patients in the input with the normal coming from one patient and the tumors coming from another patient. Seems like the normal sample is accidentally from/labeled as a different patient in the input, which causes call-gSNP to fail since it expects a normal sample per patient

RoniHaas commented 1 year ago

Hi Roni, it looks like the issue is with the YAML: /hot/project/disease/ProstateTumor/PRAD-000096-RadioResDU145Molecular/22Rv1/WXS/input/CHPRRR2M_1/metapipe_input_CHPRRR2M_1.yam - there seem to be two patients in the input with the normal coming from one patient and the tumors coming from another patient. Seems like the normal sample is accidentally from/labeled as a different patient in the input, which causes call-gSNP to fail since it expects a normal sample per patient

I see! Thank you @yashpatel6 . The problem is that my normal is not really "Normal". I am comparing cell lines of 2 types and both of them are from Tumors. For data registration, I found it right to define this sample as a different patient. Otherwise, I think it might cause confusion. I consider one of the cell line types as "Normal" since I want to identify mutations in relation to this cell line type. Is there a way to overcome this?

tyamaguchi-ucla commented 1 year ago

@RoniHaas is this discussion helpful for your case? https://github.com/uclahs-cds/metapipeline-DNA/discussions/109

RoniHaas commented 1 year ago

@RoniHaas is this discussion helpful for your case? #109

Thank you for sharing. It still seems that I would have to change the patient ID for the run, in any event, to make it work. Is that correct? I can change the patient ID for the run easily. But I thought that consistency between data registration and the output file names is important. On the other hand, changing the patient IDs for data registration to solve this issue may be less logical. Any thoughts about that? @tyamaguchi-ucla @yashpatel6

yashpatel6 commented 1 year ago

@RoniHaas is this discussion helpful for your case? #109

Thank you for sharing. It still seems that I would have to change the patient ID for the run, in any event, to make it work. Is that correct? I can change the patient ID for the run easily. But I thought that consistency between data registration and the output file names is important. On the other hand, changing the patient IDs for data registration to solve this issue may be less logical. Any thoughts about that? @tyamaguchi-ucla @yashpatel6

That is correct, the patient ID would have to be changed so the metapipeline properly associated samples. Without relying on the patient ID, grouping samples would get much more challenging from the metapipeline's perspective (there would basically have to be an additional identifier indicating grouping/relation between samples somehow). While it may end up being slightly inconsistent between dataset registration and the metapipeline run, the best solution at the moment is to change the patient ID and track it

RoniHaas commented 1 year ago

@RoniHaas is this discussion helpful for your case? #109

Thank you for sharing. It still seems that I would have to change the patient ID for the run, in any event, to make it work. Is that correct? I can change the patient ID for the run easily. But I thought that consistency between data registration and the output file names is important. On the other hand, changing the patient IDs for data registration to solve this issue may be less logical. Any thoughts about that? @tyamaguchi-ucla @yashpatel6

That is correct, the patient ID would have to be changed so the metapipeline properly associated samples. Without relying on the patient ID, grouping samples would get much more challenging from the metapipeline's perspective (there would basically have to be an additional identifier indicating grouping/relation between samples somehow). While it may end up being slightly inconsistent between dataset registration and the metapipeline run, the best solution at the moment is to change the patient ID and track it

Thanks for explaining. Yup, the run is urgent, so I will change the patient IDs. In my opinion, it might be worth thinking about these situations (I guess cell-line-related situations) for the next release if that makes sense. I have changed the issue name.