Closed edsu7 closed 5 months ago
@edsu7 FYI, based on the discussion from QC working group.
@lindaxiang added dedup column (rough calculation)
Previously identified outliers were:
All except for SA617203
meet threshold of deduplicated reads.
has 4,270,233
and 3,646,143
for HISAT and STAR respectively.
P1000-US | |
Insufficient deduped reads | 1 |
# Of sample combinations okay for release | 60 |
total | 61 |
P1000-US | |
# of donors by workflow | |
RNA-Seq | 60 |
P1000-US | |
# of files by workflow | |
RNA-Seq | 738 |
@edsu7 P1000-US RNA-Seq alignment have the following duplicated runs.
studyId | donorId | sampleId | tumourNormalDesignation | experimental_strategy | input_analysisId | workflow_name | latest_complete_time | runId | complete_time | output_analysis_to_suppress | output_analysis_state | reason |
P1000-US | DO253949 | SA617171 | Tumour | RNA-Seq | 04eca176-c474-44da-aca1-76c47464dae9 | rna-seq-alignment | 2024-05-09 6:16 | wes-a02991687ea24492bd38d314e042d9ed | 2024-05-06 4:54 | ['3c3a3c7e-78c6-4023-ba3c-7e78c6f02395', 'b02b1dde-33b4-4ebb-ab1d-de33b45ebb83', 'bc2f15a0-3437-4d2e-af15-a034375d2ecf', '270ebdc2-4e3e-4f35-8ebd-c24e3e7f358a', '8da64e4e-2379-42e3-a64e-4e237952e39b'] | ['PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED'] | duplicated_run_output_analysis |
P1000-US | DO253917 | SA617057 | Tumour | RNA-Seq | 0972b0a6-d4fa-4b2c-b2b0-a6d4fa3b2ca4 | rna-seq-alignment | 2024-05-08 12:47 | wes-9f4bae552bb648fba3134c22b2ebf535 | 2024-05-03 23:09 | ['c4256f91-72eb-4406-a56f-9172eb7406fc', 'afd810a9-d0a8-4f5a-9810-a9d0a84f5a76', '84b52624-f1f8-4dec-b526-24f1f80decba', 'a1312d71-63ce-4564-b12d-7163ced56427', '1ba15c5e-facd-4372-a15c-5efacdd3720c'] | ['PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED'] | duplicated_run_output_analysis |
P1000-US | DO253939 | SA617123 | Tumour | RNA-Seq | 17279b30-f62e-4c65-a79b-30f62e2c65fd | rna-seq-alignment | 2024-05-09 3:59 | wes-45b6c1ef3b4b4f8abbd8e0e28604867f | 2024-05-06 22:43 | ['e65cbf7d-2bb8-41a3-9cbf-7d2bb8b1a3a8', '8d8460d5-59b2-42d3-8460-d559b2e2d3ec', '76156451-4208-45b7-9564-51420855b791', '9d9a46ac-6a56-4c0f-9a46-ac6a566c0f58', '1fb5f90b-c39a-4984-b5f9-0bc39a59849c'] | ['PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED'] | duplicated_run_output_analysis |
P1000-US | DO253901 | SA617186 | Tumour | RNA-Seq | 79af8e5d-23fb-4c59-af8e-5d23fb5c592f | rna-seq-alignment | 2024-05-07 11:35 | wes-715c1b1fa7af4bc1982c10342af9c11e | 2024-05-04 0:46 | ['fbd82921-b040-4fd1-9829-21b040cfd19e', '93c5bc09-b604-494b-85bc-09b604994b36', '9fbf4477-4ca8-4fda-bf44-774ca86fdaa9', '28ec610a-cf71-4de9-ac61-0acf71bde9d6', 'fbaa1940-7d8e-4eef-aa19-407d8e5eef46'] | ['PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED'] | duplicated_run_output_analysis |
P1000-US | DO253909 | SA617032 | Tumour | RNA-Seq | 65024161-d0da-4f74-8241-61d0da5f74c8 | rna-seq-alignment | 2024-05-08 0:40 | wes-4a7e644217fe4090a6a21931a84931ca | 2024-05-04 6:21 | ['c6665059-5d46-49bd-a650-595d4679bd5e', 'cd692b2d-815d-4957-a92b-2d815d5957b7', '7f2334e5-c951-4a9e-a334-e5c951ba9e97', '36827bbe-c973-4181-827b-bec973a181a9', 'c7e20158-6308-46e6-a201-586308b6e687'] | ['PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED'] | duplicated_run_output_analysis |
P1000-US | DO253964 | SA617194 | Tumour | RNA-Seq | 0f546188-38d5-4f71-9461-8838d57f7180 | rna-seq-alignment | 2024-05-10 10:11 | wes-1eadf596de2444be9493a17ceb5cad1d | 2024-05-06 22:27 | ['25fcde12-0d59-400e-bcde-120d59800e38', '9c502a9a-87a3-4b0b-902a-9a87a35b0bf9', 'aef0bcb9-ed34-4cfa-b0bc-b9ed346cfa95', '0cedc342-7738-484a-adc3-427738c84a93', 'fc25318d-9c3e-4ab5-a531-8d9c3e0ab585'] | ['PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED'] | duplicated_run_output_analysis |
P1000-US | DO253950 | SA617173 | Tumour | RNA-Seq | 25737a61-b6b9-4971-b37a-61b6b9697104 | rna-seq-alignment | 2024-05-09 3:11 | wes-4e30ba46b3734308b2297d8f1a4e3081 | 2024-05-06 5:50 | ['d35308bc-a846-4ed5-9308-bca8469ed54e', '5db5f242-130b-4cbc-b5f2-42130b5cbcf8', '16fea05f-201c-4838-bea0-5f201c683863', 'daa5ae7a-29e3-40e5-a5ae-7a29e3e0e5bf', '7ea95cd8-7bbe-4f4a-a95c-d87bbeaf4a80'] | ['PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED'] | duplicated_run_output_analysis |
P1000-US | DO253947 | SA617182 | Tumour | RNA-Seq | e50e9e90-72a9-4981-8e9e-9072a94981cb | rna-seq-alignment | 2024-05-09 0:37 | wes-a7f0452778a44fd6ad7530386b02e611 | 2024-05-06 3:24 | ['e32fefe3-08fa-4258-afef-e308fad25838', 'a5725bc5-9e6f-4570-b25b-c59e6f85701c', 'bb6c516b-2004-42f6-ac51-6b2004f2f6e4', '7166aa64-8ac9-4017-a6aa-648ac9001701', '8dbfc35c-af23-43e3-bfc3-5caf2333e3bb'] | ['PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED'] | duplicated_run_output_analysis |
P1000-US | DO253941 | SA617255 | Tumour | RNA-Seq | 5244fc3f-15a6-40af-84fc-3f15a6f0af24 | rna-seq-alignment | 2024-05-08 19:01 | wes-b9db353fe5164d3c897fac6f21540d1b | 2024-05-06 9:11 | ['9bc87ecf-67ab-4307-887e-cf67ab5307a4', '6a65033e-63bb-47f5-a503-3e63bb77f56f', '5541fe18-2bbe-4092-81fe-182bbef09262', '2cab51d3-d7e5-4b9e-ab51-d3d7e57b9e0b', '061c4ab4-e83c-49f0-9c4a-b4e83c69f0d1'] | ['PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED'] | duplicated_run_output_analysis |
P1000-US | DO253934 | SA617308 | Tumour | RNA-Seq | 4e3f6253-b8d7-42b6-bf62-53b8d742b607 | rna-seq-alignment | 2024-05-08 5:29 | wes-573c2571da674f2abfc2565ca23c06b1 | 2024-05-06 4:31 | ['48b3194e-c55f-49de-b319-4ec55ff9de55', '27fbe1e1-7209-4f99-bbe1-e17209af994b', 'd57897b1-2658-452e-b897-b12658952e75', '9cf93a49-87a9-4ef0-b93a-4987a9cef00a', 'ee1fc592-30d5-4423-9fc5-9230d51423ed'] | ['PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED'] | duplicated_run_output_analysis |
P1000-US | DO253937 | SA617297 | Tumour | RNA-Seq | 43a216a8-2e52-4110-a216-a82e5211104e | rna-seq-alignment | 2024-05-10 5:14 | wes-1a52a11e3fcf4155bf584b93b021bd47 | 2024-05-06 19:42 | ['f2c491c6-525d-4bb6-8491-c6525d1bb6cc', '8e05a5ef-313f-46e4-85a5-ef313f86e44e', 'f6decccc-9a9d-4c37-9ecc-cc9a9d1c378b', '1eb4c2ca-8a34-4ddd-b4c2-ca8a345ddda3', '1b40501e-f1d8-43d2-8050-1ef1d883d2db'] | ['PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED', 'PUBLISHED'] | duplicated_run_output_analysis |
@lindaxiang The following samples had two runs b/c of insufficient resources to run both HISAT and STAR simultaneously.
For example SA617171
had wes-a02991687ea24492bd38d314e042d9ed and wes-4d8bd9f6d5104d40b133f4e10be49019 for STAR and HISAT2 respectively.
That being said, I did miss for the following samples the HISAT run had failed meaning we only have STAR alignments for the following:
run-id | analysis-id | sample-id |
wes-fae9c0f7e51646b08616d2766608dda7 | 997e3d6a-a9ac-499a-be3d-6aa9ac599a26 | SA617227 |
wes-2caa1cab3a4c4442a3f246276d61814e | fb8a767e-ac34-4e28-8a76-7eac341e282f | SA617299 |
wes-5806d439d16f49b0ad73834531acdad8 | fae5f553-4565-413e-a5f5-534565a13e5e | SA617300 |
wes-1df68845d5f54ce4975ccd28f8d76451 | eff00c9b-caa2-44ec-b00c-9bcaa2f4ec41 | SA617298 |
wes-a876366708814d5c927596c17d684e02 | 63718d38-ccff-4926-b18d-38ccff692637 | SA617237 |
wes-b10020700e514f06bd422d97c9cce436 | 02977de3-6960-4449-977d-e3696064495a | SA617064 |
I see, thanks @edsu7 . This won't change the numbers for release, right? We just need to remember to run the Hisat2 for the above samples for next release.
@lindaxiang It shouldn't. Though do you think we should update our release notes to indicate which donors have HISAT vs STAR?
For HISAT alignments it may be trickier, we already tried with 1 cpu at 150GB
@lindaxiang It shouldn't. Though do you think we should update our release notes to indicate which donors have HISAT vs STAR?
For HISAT alignments it may be trickier, we already tried with 1 cpu at 150GB
@edsu7 , I don't think that is needed. Or we can say:
60 previously released donors have new RNA-Seq alignments from either STAR or HISAT2 or both.
Finished. Remaining POG-CA jobs will be moved to DR10.
Current work: