mil2041 / CNCDriver

Combined mutation recurrence and functional impact to identify coding and non-coding cancer drivers
MIT License
6 stars 6 forks source link

Funseq2 output vcf parsing errors #1

Closed lybird300 closed 6 years ago

lybird300 commented 6 years ago

Hi there, first of all thanks for sharing the code and I'm looking forward to your paper when it comes out. This is not an issue of your tool, but I would really appreciate your insight since you have been working with Funseq2 for so long (I guess). I was able to make funseq2.1.6 (downloaded from the official website) work on my data and generate the Output.vcf file, but reading the file with VariantAnnotation::scanVcf (invoked by VariantAnnotation::readVcf) gives me Error: scanVcf: invalid split pattern ',(?=(ID|Number|Type)=[[:alnum:]])|,(?=Description=".?")'. I googled but could not find any solution expect for this thread. I'm wondering if you ever encountered the same problem or if there is any way to bypass the issue and still make your tool work. Thank you for your time!

Below is the first several lines of my Output.vcf file:

fileformat=VCFv4.0

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=

INFO=<ID=CANG,Number=.,Type=String,Description="Prior Gene Information, e.g.[cancer][TF_regulating_known_cancer_gene][up_regulated][actionable]...";

INFO=

INFO=

INFO=

INFO=

CHROM POS ID REF ALT QUAL FILTER INFO

chr13 100252870 . T A . . SAMPLE=T100;GERP=-4.14;CDS=No;HUB=CLYBL:PPI(0.361),TM9SF2:PPI(0.236)REG(0.409);NCENC=DHS(MCV-7|chr13:100252825-100252975),Enhancer(Roadmap_stringent|chr13:100252624-100252923),Enhancer(chmm/segway|chr13:100252783-100253000),TFM(CTCF_DHS|SP1_disc3|chr13:100252864-100252879),TFP(CTCF|chr13:100252207-100252907),TFP(CTCF|chr13:100252476-100252894),TFP(CTCF|chr13:100252487-100252873),TFP(CTCF|chr13:100252497-100252927),TFP(CTCF|chr13:100252503-100252891),TFP(CTCF|chr13:100252503-100252930),TFP(CTCF|chr13:100252505-100252888),TFP(CTCF|chr13:100252514-100253036),TFP(CTCF|chr13:100252515-100252912),TFP(CTCF|chr13:100252515-100252924),TFP(CTCF|chr13:100252516-100252910),TFP(CTCF|chr13:100252517-100252885),TFP(CTCF|chr13:100252519-100252971),TFP(CTCF|chr13:100252524-100252882),TFP(CTCF|chr13:100252527-100252884),TFP(CTCF|chr13:100252531-100252949),TFP(CTCF|chr13:100252532-100252884),TFP(CTCF|chr13:100252533-100252888),TFP(CTCF|chr13:100252535-100252892),TFP(CTCF|chr13:100252539-100252884),TFP(CTCF|chr13:100252541-100252876),TFP(CTCF|chr13:100252548-100252894),TFP(CTCF|chr13:100252564-100252892),TFP(CTCF|chr13:100252575-100252873),TFP(CTCF|chr13:100252587-100252877),TFP(CTCF|chr13:100252589-100252892),TFP(RAD21|chr13:100252520-100252938),TFP(RAD21|chr13:100252524-100252936),TFP(RAD21|chr13:100252529-100252933),TFP(RAD21|chr13:100252531-100252912),TFP(RAD21|chr13:100252537-100252892),TFP(RAD21|chr13:100252537-100252897);HOT=H1hesc;MOTIFBR=CTCF_DHS#SP1_disc3#100252864#100252879#-#10#0.131148#0.245902;MOTIFG=MZF1_3#100252864#100252870#-#1#6.892#5.961;SEN=Yes;GENE=CLYBL(Distal)TM9SF2(Distal);NCDS=3.854994663534:4.854994663534;RECUR=TFP(CTCF|chr13:100252207-100252907):T100&T236,TFP(CTCF|chr13:100252476-100252894):T100&T236 chr4 60720 . T C . . SAMPLE=T100;GERP=0.69;CDS=No;HUB=ZNF595:REG(0.987);NCENC=TFM(MAFF_MAFK|CEBPG_1|chr4:60714-60727),TFP(CHD2|chr4:50497-61009),TFP(EP300|chr4:57632-61054),TFP(FAM48A|chr4:58427-60880),TFP(FAM48A|chr4:59515-61055),TFP(FOS|chr4:55158-60924),TFP(IRF3|chr4:60169-61169),TFP(JUND|chr4:50500-62894),TFP(KAT2A|chr4:59928-60898),TFP(MAFF|chr4:60004-61056),TFP(MAFK|chr4:50489-63382),TFP(MAFK|chr4:59870-61092),TFP(MAFK|chr4:60116-61036),TFP(MXI1|chr4:50616-61014),TFP(SIN3A|chr4:52691-62808),TFP(STAT1|chr4:60005-61046),TFP(ZZZ3|chr4:59671-60953);HOT=K562;MOTIFBR=MAFF_MAFK#CEBPG_1#60714#60727#+#6#0.000000#0.714286;SEN=Yes;USEN=Yes;GENE=ZNF595(Intron);NCDS=3.7416745628426:4.7416745628426;RECUR=TFP(CHD2|chr4:50497-61009):T100&T104&T106&T107&T11&T115&T153&T157&T167&T252&T256&T258&T264&T267&T272&T275&T284&T294&T297&T300&T314&T318&T59&T70&T80&T92,TFP(EP300|chr4:57632-61054):T100&T104&T106&T107&T264&T272&T275&T333&T59&T80&T92,TFP(FAM48A|chr4:58427-60880):T100&T264&T272&T59&T80&T92,TFP(FAM48A|chr4:59515-61055):T100&T106&T264&T272&T333&T59&T80,TFP(FOS|chr4:55158-60924):T100&T104&T106&T107&T115&T256&T264&T272&T275&T294&T297&T318&T59&T80&T92,TFP(IRF3|chr4:60169-61169):T100&T106&T264&T272&T333&T80,TFP(JUND|chr4:50500-62894):T100&T104&T106&T107&T108&T11&T115&T153&T157&T167&T252&T256&T258&T264&T267&T272&T275&T284&T294&T297&T300&T314&T318&T333&T52&T59&T70&T80&T92,TFP(KAT2A|chr4:59928-60898):T100&T264&T272&T80,TFP(MAFF|chr4:60004-61056):T100&T106&T264&T272&T333&T80,TFP(MAFK|chr4:50489-63382):T100&T104&T106&T107&T108&T11&T115&T153&T157&T167&T252&T256&T258&T264&T267&T272&T275&T284&T294&T297&T300&T314&T318&T333&T52&T59&T70&T80&T92,TFP(MAFK|chr4:59870-61092):T100&T106&T264&T272&T333&T80,TFP(MAFK|chr4:60116-61036):T100&T106&T264&T272&T80,TFP(MXI1|chr4:50616-61014):T100&T104&T106&T107&T11&T115&T153&T157&T167&T252&T256&T258&T264&T267&T272&T275&T284&T294&T297&T300&T314&T318&T59&T70&T80&T92,TFP(SIN3A|chr4:52691-62808):T100&T104&T106&T107&T108&T11&T115&T153&T252&T256&T258&T264&T272&T275&T284&T294&T297&T314&T318&T333&T59&T80&T92,TFP(STAT1|chr4:60005-61046):T100&T106&T264&T272&T333&T80,TFP(ZZZ3|chr4:59671-60953):T100&T264&T272&T59&T80;DBRECUR=TFP(CHD2|chr4:50497-61009):Lung_Adeno(Altered in 17/24(70.83%) samples.)|Pancreas(Altered in 2/15(13.33%) samples.)|Prostate(Altered in 5/64(7.81%) samples.),TFP(EP300|chr4:57632-61054):Lung_Adeno(Altered in 10/24(41.67%) samples.),TFP(FAM48A|chr4:58427-60880):Lung_Adeno(Altered in 7/24(29.17%) samples.),TFP(FAM48A|chr4:59515-61055):Lung_Adeno(Altered in 6/24(25.00%) samples.),TFP(FOS|chr4:55158-60924):Lung_Adeno(Altered in 14/24(58.33%) samples.)|Prostate(Altered in 2/64(3.12%) samples.),TFP(JUND|chr4:50500-62894):Lung_Adeno(Altered in 17/24(70.83%) samples.)|Pancreas(Altered in 2/15(13.33%) samples.)|Prostate(Altered in 5/64(7.81%) samples.),TFP(KAT2A|chr4:59928-60898):Lung_Adeno(Altered in 3/24(12.50%) samples.),TFP(MAFF|chr4:60004-61056):Lung_Adeno(Altered in 3/24(12.50%) samples.),TFP(MAFK|chr4:50489-63382):Lung_Adeno(Altered in 17/24(70.83%) samples.)|Pancreas(Altered in 2/15(13.33%) samples.)|Prostate(Altered in 5/64(7.81%) samples.),TFP(MAFK|chr4:59870-61092):Lung_Adeno(Altered in 3/24(12.50%) samples.),TFP(MXI1|chr4:50616-61014):Lung_Adeno(Altered in 17/24(70.83%) samples.)|Pancreas(Altered in 2/15(13.33%) samples.)|Prostate(Altered in 5/64(7.81%) samples.),TFP(SIN3A|chr4:52691-62808):Lung_Adeno(Altered in 17/24(70.83%) samples.)|Prostate(Altered in 4/64(6.25%) samples.),TFP(STAT1|chr4:60005-61046):Lung_Adeno(Altered in 3/24(12.50%) samples.),TFP(ZZZ3|chr4:59671-60953):Lung_Adeno(Altered in 4/24(16.67%) samples.)

mil2041 commented 6 years ago

Hi,

It seems VariantAnnotation::scanVcf cannot parse vcf output file from FunSeq2 correctly. Because I have not encountered this error message in the VariantAnnotation package before, I am not sure which line of your Output.vcf generate this error.

You will need to figure out which line in the Output.vcf that cannot be recognized by the VariantAnnotation parser. Otherwise, making a custom VCF file parser will be another possible solution.

There could be several possible places you can check. (1) Does your file end with correct "\n" symbol in the file. Mac will generate "^\M" symbol in the file that may cause parser encounter error. (2) If you randomly pick several lines in the Output.vcf, and then VariantAnnotation::scanVcf does not generate error message anymore. You can be surer it is just some lines in the Output.vcf generate this error.

lybird300 commented 6 years ago

Hi Eric, thank you for your quick reply. I reinstalled all related R packages and the readVcf function works now! Still no clue what went wrong but I can live with that. Thanks again and good luck with everything!