parklab / MosaicForecast

A mosaic detecting software based on phasing and random forest
MIT License
62 stars 21 forks source link

Error: "check the format of your input file" #8

Closed vmscmams closed 4 years ago

vmscmams commented 4 years ago

Hello. Hope you are doing fine, I was tying to run the Phase.py scritpt with the demo data but i got an error on the content of the test.input file. I've used the example file on this directory but it fails with following error message:

root@cc73d352d9f3:/MF# python Phase.py demo demo/phasing2 Ref/ucsc.hg19.fasta demo/test.input 20 Map_score/hg19/k24.umap.wg.bw 2 demo/test.cram check the format of your input file: chr pos-1 pos ref alt sample check the format of your input file: chr pos-1 pos ref alt sample check the format of your input file: chr pos-1 pos ref alt sample check the format of your input file: chr pos-1 pos ref alt sample check the format of your input file: chr pos-1 pos ref alt sample check the format of your input file: chr pos-1 pos ref alt sample

Also, I used the MuTect2-PoN_filter.py script to create a new one from the test.Mutect2.vcf file and the resource/SegDup_and_clustered.GRCh37.bed file. it worked for the test.input file generation but the Phase.py script still fails.

root@cc73d352d9f3:/MF# python Phase.py demo demo/phasing2 Ref/ucsc.hg19.fasta demo/test.input 20 Map_score/hg19/k24.umap.wg.bw 2 demo/test.bam check the format of your input file: chr pos-1 pos ref alt sample check the format of your input file: chr pos-1 pos ref alt sample check the format of your input file: chr pos-1 pos ref alt sample check the format of your input file: chr pos-1 pos ref alt sample

Could you help me please? Am I missing something?

Looks like it fails to get the colums and i double check the characters between each value are tabs...

thanks

douym commented 4 years ago

Hello. Hope you are doing fine, I was tying to run the Phase.py scritpt with the demo data but i got an error on the content of the test.input file. I've used the example file on this directory but it fails with following error message:

root@cc73d352d9f3:/MF# python Phase.py demo demo/phasing2 Ref/ucsc.hg19.fasta demo/test.input 20 Map_score/hg19/k24.umap.wg.bw 2 demo/test.cram check the format of your input file: chr pos-1 pos ref alt sample check the format of your input file: chr pos-1 pos ref alt sample check the format of your input file: chr pos-1 pos ref alt sample check the format of your input file: chr pos-1 pos ref alt sample check the format of your input file: chr pos-1 pos ref alt sample check the format of your input file: chr pos-1 pos ref alt sample

Also, I used the MuTect2-PoN_filter.py script to create a new one from the test.Mutect2.vcf file and the resource/SegDup_and_clustered.GRCh37.bed file. it worked for the test.input file generation but the Phase.py script still fails.

root@cc73d352d9f3:/MF# python Phase.py demo demo/phasing2 Ref/ucsc.hg19.fasta demo/test.input 20 Map_score/hg19/k24.umap.wg.bw 2 demo/test.bam check the format of your input file: chr pos-1 pos ref alt sample check the format of your input file: chr pos-1 pos ref alt sample check the format of your input file: chr pos-1 pos ref alt sample check the format of your input file: chr pos-1 pos ref alt sample

Could you help me please? Am I missing something?

Looks like it fails to get the colums and i double check the characters between each value are tabs...

thanks

Hi @vmscmams ,

Sorry for the late reply!

The last parameter is the file format rather than the cram file:

python Phase.py demo demo/phasing2 Ref/ucsc.hg19.fasta demo/test.input 20 Map_score/hg19/k24.umap.wg.bw 2 demo/test.cram

Could you change the command to this and try to run again?

python Phase.py demo demo/phasing2 Ref/ucsc.hg19.fasta demo/test.input 20 Map_score/hg19/k24.umap.wg.bw 2 cram

Thanks!

vmscmams commented 4 years ago

Hello!, Thanks for your reply, I've made the change you suggested and it still fails but using that logic I've changed to BAM and it worked.

Now I'm testing with my exome dataset and it fails since my mutect2 VCF has multi-allelic values and the MuTect2-PoN_filter.py script fails when it tries to convert that a value to Float.

Error Msg: Traceback (most recent call last): File "/usr/local/bin/MuTect2-PoN_filter.py", line 48, in AF=float(INFOs[2]) ValueError: could not convert string to float: '0.429,0.286'

Data string on my VCF. chr1 181583 . CGGGGGG C,CGG . clustered_events;germline;map_qual;multiallelic CONTQ=30;DP=4;ECNT=2;GERMQ=1;MBQ=0,30,30;MFRL=186,320,326;MMQ=27,60,60;MPOS=23,25;POPAF=7.30,7.30;RPA=17,11,13;RU=G;SEQQ=2;STR;STRANDQ=93;STRQ=93;TLOD=7.62,3.37 GT:AD:AF:DP:F1R2:F2R1:SB 0/1/2:1,2,1:0.429,0.286:4:0,1,0:0,1,1:1,0,3,0

Did you experience datasets with this kind of values?

PS: tried to run mutect2 with max allels parameter in 1 and still generates 2 values.