luntergroup / octopus

Bayesian haplotype-based mutation calling
MIT License
302 stars 38 forks source link

ERROR while training random forest #89

Closed 24natasya closed 4 years ago

24natasya commented 4 years ago

Hi! I have been having this problem when training random forest numerous time

[2019-10-22 09:17:41] An unclassified error has occurred: [2019-10-22 09:17:41] [2019-10-22 09:17:41] _Map_base::at. [2019-10-22 09:17:41] [2019-10-22 09:17:41] To help resolve this error submit an error report. [2019-10-22 09:17:41] Encountered error in task writer thread. Calling terminate terminate called after throwing an instance of 'std::out_of_range' what(): _Map_base::at Error: The --calls file "/export/home/natasya/forests/NA12878_n/octopus.GIAB_NA12878_GRCh37D_novoalignV4.sort.hs37d5.fa.legacy.vcf.gz" does not exist.

Usage: rtg vcfeval [OPTION]... -b FILE -c FILE -o DIR -t SDF

Try '--help' for more information [E::hts_open_format] fail to open file '/export/home/natasya/forests/NA12878_n/octopus.GIAB_NA12878_GRCh37D_novoalignV4.sort.hs37d5.fa.eval/tp.vcf.gz' Failed to open /export/home/natasya/forests/NA12878_n/octopus.GIAB_NA12878_GRCh37D_novoalignV4.sort.hs37d5.fa.eval/tp.vcf.gz: No such file or directory [E::hts_open_format] Failed to open file /export/home/natasya/forests/NA12878_n/octopus.GIAB_NA12878_GRCh37D_novoalignV4.sort.hs37d5.fa.eval/tp.train.vcf.gz Traceback (most recent call last): File "/tmp/octopus/scripts/train_random_forest.py", line 202, in main(parsed) File "/tmp/octopus/scripts/train_random_forest.py", line 114, in main make_ranger_data(tp_train_vcf_path, tp_data_path, True, default_measures, options.missing_value) File "/tmp/octopus/scripts/train_random_forest.py", line 68, in make_ranger_data vcf = VariantFile(octopus_vcf_path) File "pysam/libcbcf.pyx", line 4017, in pysam.libcbcf.VariantFile.init File "pysam/libcbcf.pyx", line 4238, in pysam.libcbcf.VariantFile.open FileNotFoundError: [Errno 2] could not open variant file b'/export/home/natasya/forests/NA12878_n/octopus.GIAB_NA12878_GRCh37D_novoalignV4.sort.hs37d5.fa.eval/tp.train.vcf.gz': No such file or directory

The command lines i used is as below : ./train_random_forest.py -R /export/Projects/2019_MLVarCaller/01_BAMs/hs37d5.fa -I /export/Projects/2019_MLVarCaller/01_BAMs/novoalignV4/GIAB_NA12878_GRCh37D_novoalignV4.sort.bam -T /export/Projects/2019_MLVarCaller/02_TruthSets/GIAB_NA12878/GRCh37_nexterarapidcapture_expandedexome_targetedregions.bed --truth /export/Projects/2019_MLVarCaller/02_TruthSets/GIAB_NA12878/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.vcf.gz --confident /export/Projects/2019_MLVarCaller/02_TruthSets/GIAB_NA12878/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_nosomaticdel.bed --octopus /tmp/octopus/bin/octopus --rtg /export/home/natasya/rtg-tools-3.10.1/rtg --sdf /export/Projects/2019_MLVarCaller/01_BAMs/hs37d5.sdf --ranger /home/ranger/ranger/cpp_version/build/ranger --trees 300 --min_node_size 20 --missing_value -1 --prefix NA12878.wgs -o /export/home/natasya/forests/NA12878_n/ --threads 10 >

What possible error could it be? I ran the same command line on same sample but different aligner there seems to be no error.

dancooke commented 4 years ago

Which version of Octopus are you using?

24natasya commented 4 years ago

octopus v0.6.3-beta (develop 7eab0cdd)

On Tue, 22 Oct 2019 at 8:33 PM, Daniel Cooke notifications@github.com wrote:

Which version of Octopus are you using?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/luntergroup/octopus/issues/89?email_source=notifications&email_token=AMYUHWLTAK2AZQBO3Y2GS73QP3XI7A5CNFSM4JDM6NY2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEB5R6YY#issuecomment-544939875, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMYUHWJE3SJM6ZHBBYOFIPTQP3XI7ANCNFSM4JDM6NYQ .

dancooke commented 4 years ago

Could you please try the latest development version? Note that this comes with a new version of train_random_forest.py that has a different interface to the one you're currently using. I have yet to update the documentation for this, but you should be able to replace your command with:

$ ./train_random_forest.py \
    --config config.json \
    --octopus /tmp/octopus/bin/octopus \
    --rtg /export/home/natasya/rtg-tools-3.10.1/rtg \
    --ranger /home/ranger/ranger/cpp_version/build/ranger \
    --prefix NA12878.wgs \
    -o /export/home/natasya/forests/NA12878_n \
    --threads 10

where config.json contains:

{
    "truths": {
        "GRCh37.HG001": {
            "vcf": "/export/Projects/2019_MLVarCaller/02_TruthSets/GIAB_NA12878/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.vcf.gz",
            "bed": "/export/Projects/2019_MLVarCaller/02_TruthSets/GIAB_NA12878/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_nosomaticdel.bed"
        }
    },
    "examples": [
        {
            "reference": "/export/Projects/2019_MLVarCaller/01_BAMs/hs37d5.fa",
            "reads": "/export/Projects/2019_MLVarCaller/01_BAMs/novoalignV4/GIAB_NA12878_GRCh37D_novoalignV4.sort.bam",
            "calling_regions": "/export/Projects/2019_MLVarCaller/02_TruthSets/GIAB_NA12878/GRCh37_nexterarapidcapture_expandedexome_targetedregions.bed",
            "truth": "GRCh37.HG001"
        }
    ],
    "training": {
        "hyperparameters": [
            {
                "trees": 300,
                "min_node_size": 20
            }
        ]
    }
}
24natasya commented 4 years ago

Ok, I will try this and see if it works

On Wed, 23 Oct 2019 at 7:18 PM, Daniel Cooke notifications@github.com wrote:

Could you please try the latest development version? Note that this comes with a new version of train_random_forest.py that has a different interface to the one you're currently using. I have yet to update the documentation for this, but you should be able to replace your command with:

$ ./train_random_forest.py \ --config config.json \ --octopus /tmp/octopus/bin/octopus \ --rtg /export/home/natasya/rtg-tools-3.10.1/rtg \ --ranger /home/ranger/ranger/cpp_version/build/ranger \ --prefix NA12878.wgs \ -o /export/home/natasya/forests/NA12878_n \ --threads 10

where config.json contains:

{ "truths": { "GRCh37.HG001": { "vcf": "/export/Projects/2019_MLVarCaller/02_TruthSets/GIAB_NA12878/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.vcf.gz", "bed": "/export/Projects/2019_MLVarCaller/02_TruthSets/GIAB_NA12878/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_nosomaticdel.bed" } }, "examples": [ { "reference": "/export/Projects/2019_MLVarCaller/01_BAMs/hs37d5.fa", "reads": "/export/Projects/2019_MLVarCaller/01_BAMs/novoalignV4/GIAB_NA12878_GRCh37D_novoalignV4.sort.bam", "calling_regions": "/export/Projects/2019_MLVarCaller/02_TruthSets/GIAB_NA12878/GRCh37_nexterarapidcapture_expandedexome_targetedregions.bed", "truth": "GRCh37.HG001" } ], "training": { "hyperparameters": [ { "trees": 300, "min_node_size": 20 } ] } }

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/luntergroup/octopus/issues/89?email_source=notifications&email_token=AMYUHWIUZCECJZ4OOHPEV73QQAXHPA5CNFSM4JDM6NY2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECBBDWI#issuecomment-545395161, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMYUHWPU3XUVWPHM2AGXX7DQQAXHPANCNFSM4JDM6NYQ .

24natasya commented 4 years ago

I run the training and it works with the new development. Thank you

On Wed, Oct 23, 2019 at 9:08 PM Natasya Umairah Bt Mohd Omeershffudin < natasya@novocraft.com> wrote:

Ok, I will try this and see if it works

On Wed, 23 Oct 2019 at 7:18 PM, Daniel Cooke notifications@github.com wrote:

Could you please try the latest development version? Note that this comes with a new version of train_random_forest.py that has a different interface to the one you're currently using. I have yet to update the documentation for this, but you should be able to replace your command with:

$ ./train_random_forest.py \ --config config.json \ --octopus /tmp/octopus/bin/octopus \ --rtg /export/home/natasya/rtg-tools-3.10.1/rtg \ --ranger /home/ranger/ranger/cpp_version/build/ranger \ --prefix NA12878.wgs \ -o /export/home/natasya/forests/NA12878_n \ --threads 10

where config.json contains:

{ "truths": { "GRCh37.HG001": { "vcf": "/export/Projects/2019_MLVarCaller/02_TruthSets/GIAB_NA12878/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_PGandRTGphasetransfer.vcf.gz", "bed": "/export/Projects/2019_MLVarCaller/02_TruthSets/GIAB_NA12878/HG001_GRCh37_GIAB_highconf_CG-IllFB-IllGATKHC-Ion-10X-SOLID_CHROM1-X_v.3.3.2_highconf_nosomaticdel.bed" } }, "examples": [ { "reference": "/export/Projects/2019_MLVarCaller/01_BAMs/hs37d5.fa", "reads": "/export/Projects/2019_MLVarCaller/01_BAMs/novoalignV4/GIAB_NA12878_GRCh37D_novoalignV4.sort.bam", "calling_regions": "/export/Projects/2019_MLVarCaller/02_TruthSets/GIAB_NA12878/GRCh37_nexterarapidcapture_expandedexome_targetedregions.bed", "truth": "GRCh37.HG001" } ], "training": { "hyperparameters": [ { "trees": 300, "min_node_size": 20 } ] } }

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/luntergroup/octopus/issues/89?email_source=notifications&email_token=AMYUHWIUZCECJZ4OOHPEV73QQAXHPA5CNFSM4JDM6NY2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECBBDWI#issuecomment-545395161, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMYUHWPU3XUVWPHM2AGXX7DQQAXHPANCNFSM4JDM6NYQ .