icgc-argo / workflow-roadmap

Roadmap and management for genomic data processing
GNU Affero General Public License v3.0
1 stars 0 forks source link

DR9 - QC P1000-US RNA-Seq #437

Closed edsu7 closed 5 months ago

edsu7 commented 6 months ago

Current work: https://docs.google.com/spreadsheets/d/1537uC_c7-8Fisy-EK3MBThxBcLTRJzGwXLxbaPXyNhM/edit#gid=1403247005

lindaxiang commented 6 months ago

@edsu7 FYI, based on the discussion from QC working group.

edsu7 commented 6 months ago

@lindaxiang added dedup column (rough calculation)

Previously identified outliers were:

SA617297
SA617242
SA617223
SA617203

All except for SA617203 meet threshold of deduplicated reads. SA617203 has 4,270,233 and 3,646,143 for HISAT and STAR respectively.

edsu7 commented 6 months ago
  P1000-US
Insufficient deduped reads 1
# Of sample combinations okay for release 60
total 61
  P1000-US
# of donors by workflow  
RNA-Seq 60
  P1000-US
# of files by workflow  
RNA-Seq 738
lindaxiang commented 6 months ago

@edsu7 P1000-US RNA-Seq alignment have the following duplicated runs.

studyId donorId sampleId tumourNormalDesignation experimental_strategy input_analysisId workflow_name latest_complete_time runId complete_time output_analysis_to_suppress output_analysis_state reason
P1000-US DO253949 SA617171 Tumour RNA-Seq 04eca176-c474-44da-aca1-76c47464dae9 rna-seq-alignment 2024-05-09 6:16 wes-a02991687ea24492bd38d314e042d9ed 2024-05-06 4:54 ['3c3a3c7e-78c6-4023-ba3c-7e78c6f02395', 'b02b1dde-33b4-4ebb-ab1d-de33b45ebb83', 'bc2f15a0-3437-4d2e-af15-a034375d2ecf', '270ebdc2-4e3e-4f35-8ebd-c24e3e7f358a', '8da64e4e-2379-42e3-a64e-4e237952e39b'] ['PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED'] duplicated_run_output_analysis
P1000-US DO253917 SA617057 Tumour RNA-Seq 0972b0a6-d4fa-4b2c-b2b0-a6d4fa3b2ca4 rna-seq-alignment 2024-05-08 12:47 wes-9f4bae552bb648fba3134c22b2ebf535 2024-05-03 23:09 ['c4256f91-72eb-4406-a56f-9172eb7406fc', 'afd810a9-d0a8-4f5a-9810-a9d0a84f5a76', '84b52624-f1f8-4dec-b526-24f1f80decba', 'a1312d71-63ce-4564-b12d-7163ced56427', '1ba15c5e-facd-4372-a15c-5efacdd3720c'] ['PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED'] duplicated_run_output_analysis
P1000-US DO253939 SA617123 Tumour RNA-Seq 17279b30-f62e-4c65-a79b-30f62e2c65fd rna-seq-alignment 2024-05-09 3:59 wes-45b6c1ef3b4b4f8abbd8e0e28604867f 2024-05-06 22:43 ['e65cbf7d-2bb8-41a3-9cbf-7d2bb8b1a3a8', '8d8460d5-59b2-42d3-8460-d559b2e2d3ec', '76156451-4208-45b7-9564-51420855b791', '9d9a46ac-6a56-4c0f-9a46-ac6a566c0f58', '1fb5f90b-c39a-4984-b5f9-0bc39a59849c'] ['PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED'] duplicated_run_output_analysis
P1000-US DO253901 SA617186 Tumour RNA-Seq 79af8e5d-23fb-4c59-af8e-5d23fb5c592f rna-seq-alignment 2024-05-07 11:35 wes-715c1b1fa7af4bc1982c10342af9c11e 2024-05-04 0:46 ['fbd82921-b040-4fd1-9829-21b040cfd19e', '93c5bc09-b604-494b-85bc-09b604994b36', '9fbf4477-4ca8-4fda-bf44-774ca86fdaa9', '28ec610a-cf71-4de9-ac61-0acf71bde9d6', 'fbaa1940-7d8e-4eef-aa19-407d8e5eef46'] ['PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED'] duplicated_run_output_analysis
P1000-US DO253909 SA617032 Tumour RNA-Seq 65024161-d0da-4f74-8241-61d0da5f74c8 rna-seq-alignment 2024-05-08 0:40 wes-4a7e644217fe4090a6a21931a84931ca 2024-05-04 6:21 ['c6665059-5d46-49bd-a650-595d4679bd5e', 'cd692b2d-815d-4957-a92b-2d815d5957b7', '7f2334e5-c951-4a9e-a334-e5c951ba9e97', '36827bbe-c973-4181-827b-bec973a181a9', 'c7e20158-6308-46e6-a201-586308b6e687'] ['PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED'] duplicated_run_output_analysis
P1000-US DO253964 SA617194 Tumour RNA-Seq 0f546188-38d5-4f71-9461-8838d57f7180 rna-seq-alignment 2024-05-10 10:11 wes-1eadf596de2444be9493a17ceb5cad1d 2024-05-06 22:27 ['25fcde12-0d59-400e-bcde-120d59800e38', '9c502a9a-87a3-4b0b-902a-9a87a35b0bf9', 'aef0bcb9-ed34-4cfa-b0bc-b9ed346cfa95', '0cedc342-7738-484a-adc3-427738c84a93', 'fc25318d-9c3e-4ab5-a531-8d9c3e0ab585'] ['PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED'] duplicated_run_output_analysis
P1000-US DO253950 SA617173 Tumour RNA-Seq 25737a61-b6b9-4971-b37a-61b6b9697104 rna-seq-alignment 2024-05-09 3:11 wes-4e30ba46b3734308b2297d8f1a4e3081 2024-05-06 5:50 ['d35308bc-a846-4ed5-9308-bca8469ed54e', '5db5f242-130b-4cbc-b5f2-42130b5cbcf8', '16fea05f-201c-4838-bea0-5f201c683863', 'daa5ae7a-29e3-40e5-a5ae-7a29e3e0e5bf', '7ea95cd8-7bbe-4f4a-a95c-d87bbeaf4a80'] ['PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED'] duplicated_run_output_analysis
P1000-US DO253947 SA617182 Tumour RNA-Seq e50e9e90-72a9-4981-8e9e-9072a94981cb rna-seq-alignment 2024-05-09 0:37 wes-a7f0452778a44fd6ad7530386b02e611 2024-05-06 3:24 ['e32fefe3-08fa-4258-afef-e308fad25838', 'a5725bc5-9e6f-4570-b25b-c59e6f85701c', 'bb6c516b-2004-42f6-ac51-6b2004f2f6e4', '7166aa64-8ac9-4017-a6aa-648ac9001701', '8dbfc35c-af23-43e3-bfc3-5caf2333e3bb'] ['PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED'] duplicated_run_output_analysis
P1000-US DO253941 SA617255 Tumour RNA-Seq 5244fc3f-15a6-40af-84fc-3f15a6f0af24 rna-seq-alignment 2024-05-08 19:01 wes-b9db353fe5164d3c897fac6f21540d1b 2024-05-06 9:11 ['9bc87ecf-67ab-4307-887e-cf67ab5307a4', '6a65033e-63bb-47f5-a503-3e63bb77f56f', '5541fe18-2bbe-4092-81fe-182bbef09262', '2cab51d3-d7e5-4b9e-ab51-d3d7e57b9e0b', '061c4ab4-e83c-49f0-9c4a-b4e83c69f0d1'] ['PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED'] duplicated_run_output_analysis
P1000-US DO253934 SA617308 Tumour RNA-Seq 4e3f6253-b8d7-42b6-bf62-53b8d742b607 rna-seq-alignment 2024-05-08 5:29 wes-573c2571da674f2abfc2565ca23c06b1 2024-05-06 4:31 ['48b3194e-c55f-49de-b319-4ec55ff9de55', '27fbe1e1-7209-4f99-bbe1-e17209af994b', 'd57897b1-2658-452e-b897-b12658952e75', '9cf93a49-87a9-4ef0-b93a-4987a9cef00a', 'ee1fc592-30d5-4423-9fc5-9230d51423ed'] ['PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED'] duplicated_run_output_analysis
P1000-US DO253937 SA617297 Tumour RNA-Seq 43a216a8-2e52-4110-a216-a82e5211104e rna-seq-alignment 2024-05-10 5:14 wes-1a52a11e3fcf4155bf584b93b021bd47 2024-05-06 19:42 ['f2c491c6-525d-4bb6-8491-c6525d1bb6cc', '8e05a5ef-313f-46e4-85a5-ef313f86e44e', 'f6decccc-9a9d-4c37-9ecc-cc9a9d1c378b', '1eb4c2ca-8a34-4ddd-b4c2-ca8a345ddda3', '1b40501e-f1d8-43d2-8050-1ef1d883d2db'] ['PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED'] duplicated_run_output_analysis
edsu7 commented 6 months ago

@lindaxiang The following samples had two runs b/c of insufficient resources to run both HISAT and STAR simultaneously.

SA617171
SA617057
SA617123
SA617186
SA617032
SA617194
SA617173
SA617182
SA617255
SA617308
SA617297

For example SA617171 had wes-a02991687ea24492bd38d314e042d9ed and wes-4d8bd9f6d5104d40b133f4e10be49019 for STAR and HISAT2 respectively.

That being said, I did miss for the following samples the HISAT run had failed meaning we only have STAR alignments for the following:

run-id analysis-id sample-id
wes-fae9c0f7e51646b08616d2766608dda7 997e3d6a-a9ac-499a-be3d-6aa9ac599a26 SA617227
wes-2caa1cab3a4c4442a3f246276d61814e fb8a767e-ac34-4e28-8a76-7eac341e282f SA617299
wes-5806d439d16f49b0ad73834531acdad8 fae5f553-4565-413e-a5f5-534565a13e5e SA617300
wes-1df68845d5f54ce4975ccd28f8d76451 eff00c9b-caa2-44ec-b00c-9bcaa2f4ec41 SA617298
wes-a876366708814d5c927596c17d684e02 63718d38-ccff-4926-b18d-38ccff692637 SA617237
wes-b10020700e514f06bd422d97c9cce436 02977de3-6960-4449-977d-e3696064495a SA617064
lindaxiang commented 6 months ago

I see, thanks @edsu7 . This won't change the numbers for release, right? We just need to remember to run the Hisat2 for the above samples for next release.

edsu7 commented 6 months ago

@lindaxiang It shouldn't. Though do you think we should update our release notes to indicate which donors have HISAT vs STAR?

For HISAT alignments it may be trickier, we already tried with 1 cpu at 150GB

lindaxiang commented 6 months ago

@lindaxiang It shouldn't. Though do you think we should update our release notes to indicate which donors have HISAT vs STAR?

For HISAT alignments it may be trickier, we already tried with 1 cpu at 150GB

@edsu7 , I don't think that is needed. Or we can say:

60 previously released donors have new RNA-Seq alignments from either STAR or HISAT2 or both. 
edsu7 commented 5 months ago

Finished. Remaining POG-CA jobs will be moved to DR10.