lima1 / PureCN

Copy number calling and variant classification using targeted short read sequencing
https://bioconductor.org/packages/devel/bioc/html/PureCN.html
Artistic License 2.0
125 stars 32 forks source link

Tumor only PureCN filtering failure #354

Closed Shenglai closed 4 months ago

Shenglai commented 4 months ago

Hi, I'm investigating some tumor only jobs from GDC that were failed to low germline variants ratio. Most of them failed when only <0.5% germline variants were detected or <2% but the total amount of normal variants is less than 10.

However, I noticed there were 2 jobs failed with relatively large amount of germline variants.

[2024-01-20 08:57:53] 2998 (4.6%) variants annotated as likely germline (DB INFO flag).
[2023-11-10 00:33:36] 5711 (6.4%) variants annotated as likely germline (DB INFO flag).

error.1.log error.2.log

Attached are the job logs of them, I think the error messages are different from the others. I'm wondering if you could provide insight on why they will fail, and also would like to what is the minimum ratio or amount of germline variants PureCN needs.

Thanks a lot in advance.

lima1 commented 4 months ago

I think this too has some germline filtering done. Normally the VCF header contains information about the commands that were used to generate and filter it. Otherwise, I would hunt down all the log files and try to repruduce how the VCFs were generated.

Also note that we're not a whole gnome shop and you probably have more luck with tools designed for WGS. Batternberg from the Sanger for example.

Shenglai commented 4 months ago

Thanks again!