slowkoni / rfmix

RFMIX - Local Ancestry and Admixture Inference Version 2
72 stars 24 forks source link

Segmentation during scanning for optimal CRF weight #43

Open kuangzhuoran opened 1 year ago

kuangzhuoran commented 1 year ago

Hello:

RFMIX v2.03-r0 - Local Ancestry and Admixture Inference (c) 2016, 2017 Mark Koni Hamilton Wright Bustamante Lab - Stanford University School of Medicine Based on concepts developed in RFMIX v1 by Brian Keith Maples, et al.

This version is licensed for non-commercial academic research use only For commercial licensing, please contact cdbadmin@stanford.edu

--- For use in scientific publications please cite original publication --- Brian Maples, Simon Gravel, Eimear E. Kenny, and Carlos D. Bustamante (2013). RFMix: A Discriminative Modeling Approach for Rapid and Robust Local-Ancestry Inference. Am. J. Hum. Genet. 93, 278-288

Loading genetic map for chromosome Chr1 ... done Mapping samples ... 29 samples combined Scanning input VCFs for common SNPs on chromosome Chr1 ... 956161 SNPs Loading haplotypes... done Defining and initializing conditional random field...
setting up CRF points and random forest windows... computing random forest window spacing overlay... initializing apriori reference subpop across CRF... setting up random forest probability estimation arrays... done Defining and initializing conditional random field... done 9589734 (17.3%) variant alleles 0 (0.0%) missing alleles

Generating internal simulation samples...
Internally simulated 154 samples from 1 randomly selected reference parents.

Scanning for optimal CRF Weight.... /slurmState/slurmSpool/slurmd/job775448/slurm_script: line 17: 10145 Segmentation fault (core dumped) ./rfmix -f sp1.chr1.vcf -r sp2.chr1.vcf -m sp2.pop -g sp1.genetic.map -o outer --chromosome=Chr1

my command is : ./rfmix -f sp1.chr1.vcf -r sp2.chr1.vcf -m sp2.pop -g sp1.genetic.map -o outer --chromosome=Chr1 What could this be about? = =

kuangzhuoran commented 1 year ago

I switched to another dataset and now run two more rows: --- For use in scientific publications please cite original publication --- Brian Maples, Simon Gravel, Eimear E. Kenny, and Carlos D. Bustamante (2013). RFMix: A Discriminative Modeling Approach for Rapid and Robust Local-Ancestry Inference. Am. J. Hum. Genet. 93, 278-288

Loading genetic map for chromosome Chr1 ... done Mapping samples ... 26 samples combined Scanning input VCFs for common SNPs on chromosome Chr1 ... 4258431 SNPs Loading haplotypes... done Defining and initializing conditional random field...
setting up CRF points and random forest windows... computing random forest window spacing overlay... initializing apriori reference subpop across CRF... setting up random forest probability estimation arrays... done Defining and initializing conditional random field... done 94462316 (42.7%) variant alleles 0 (0.0%) missing alleles

Generating internal simulation samples...
Internally simulated 185 samples from 2 randomly selected reference parents. Growing Random Forest Trees -- (851687/851687) 100.0%
Scanning for optimal CRF Weight.... Conditional random field ... 211/ 211 (100.0%) [1] 1897230 segmentation fault (core dumped) ./rfmix -f Mp.Chr1.vcf -r Ma.Chr1.vcf -m Ma.pop -g MpMa.all.genetic.map -o

chibispy commented 1 year ago

I've got the same exact error, chromossomes 1-8 worked fine, but 9 and 10 didn't. Still haven't tried the rest but it's weird how it doesn't seems to be about the size of the chromossome. Aditionally, I stried upgrading the RAM to 4x the size of what worked with the chromossomes 1-8 and tried to increase and decrease the number of threads, but regardless it didn't solved it. it even seems to run a bit further than your output as it gives a few ancestries but immediatly crashes without writing any output, here's what I get:

rfmix -f 510k_hg38.vcf.gz -r RFmix/ALL.wgs.integrated_sv_map_v1_GRCh38.20130502.svs.genotypes.vcf.gz -g RFmix/chr10.modified -m RFmix/integrated_call_samples_v3.20130502.todos.panel -o maps510k/510k_hg38_chr10 --chromosome=10 --n-threads=4

RFMIX v2.03-r0 - Local Ancestry and Admixture Inference (c) 2016, 2017 Mark Koni Hamilton Wright Bustamante Lab - Stanford University School of Medicine Based on concepts developed in RFMIX v1 by Brian Keith Maples, et al.

This version is licensed for non-commercial academic research use only For commercial licensing, please contact cdbadmin@stanford.edu

--- For use in scientific publications please cite original publication --- Brian Maples, Simon Gravel, Eimear E. Kenny, and Carlos D. Bustamante (2013). RFMix: A Discriminative Modeling Approach for Rapid and Robust Local-Ancestry Inference. Am. J. Hum. Genet. 93, 278-288

Loading genetic map for chromosome 10 ... done Mapping samples ... 3358 samples combined Scanning input VCFs for common SNPs on chromosome 10 ... 52 SNPs Loading haplotypes... done Defining and initializing conditional random field...
setting up CRF points and random forest windows... computing random forest window spacing overlay... initializing apriori reference subpop across CRF... setting up random forest probability estimation arrays... done Defining and initializing conditional random field... done 10523 (3.0%) variant alleles 2 (0.0%) missing alleles

Generating internal simulation samples...
Internally simulated 1132 samples from 263 randomly selected reference parents. Growing Random Forest Trees -- (11/11) 100.0%
Scanning for optimal CRF Weight.... Conditional random field ... 4490/ 4490 (100.0%)

Maximum scoring weight is 1 (-inf) Simulation results... ACB ASW BEB CDX CEU CHB CHS CLM ESN FIN GBR GIH GWD IBS ITU JPT KHV LWK MSL MXL PEL PJL PUR STU TSI YRI 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0Segmentation fault

jamesfifer commented 6 months ago

I switched to another dataset and now run two more rows: --- For use in scientific publications please cite original publication --- Brian Maples, Simon Gravel, Eimear E. Kenny, and Carlos D. Bustamante (2013). RFMix: A Discriminative Modeling Approach for Rapid and Robust Local-Ancestry Inference. Am. J. Hum. Genet. 93, 278-288

Loading genetic map for chromosome Chr1 ... done Mapping samples ... 26 samples combined Scanning input VCFs for common SNPs on chromosome Chr1 ... 4258431 SNPs Loading haplotypes... done Defining and initializing conditional random field... setting up CRF points and random forest windows... computing random forest window spacing overlay... initializing apriori reference subpop across CRF... setting up random forest probability estimation arrays... done Defining and initializing conditional random field... done 94462316 (42.7%) variant alleles 0 (0.0%) missing alleles

Generating internal simulation samples... Internally simulated 185 samples from 2 randomly selected reference parents. Growing Random Forest Trees -- (851687/851687) 100.0% Scanning for optimal CRF Weight.... Conditional random field ... 211/ 211 (100.0%) [1] 1897230 segmentation fault (core dumped) ./rfmix -f Mp.Chr1.vcf -r Ma.Chr1.vcf -m Ma.pop -g MpMa.all.genetic.map -o

It is likely a memory issue. I ran into the same problem and was unable to get it work no matter how much memory I allocated. I solved it by downsizing my genetic map (I initially had genetic distance for every single locus, but rfmix will still run fine with a subset)

If that doesnt work you can also use the example dataset here as a positive control

bamorim-bio commented 2 months ago

I get this error too !

Loading genetic map for chromosome 21 ...  done
Mapping samples ... 1274 samples combined
Scanning input VCFs for common SNPs on chromosome 21 ...   47 SNPs
Loading haplotypes... done
Defining and initializing conditional random field...
   setting up CRF points and random forest windows...
   computing random forest window spacing overlay...
   initializing apriori reference subpop across CRF...
   setting up random forest probability estimation arrays... done
Defining and initializing conditional random field...   done
16639 (13.9%) variant alleles   0 (0.0%) missing alleles

Generating internal simulation samples...
Internally simulated 400 samples from 2 randomly selected reference parents.
Growing Random Forest Trees -- (10/10) 100.0%
Scanning for optimal CRF Weight....
Conditional random field ...         1674/  1674 (100.0%)

Maximum scoring weight is 1 (-inf)
Simulation results...
        Source1       Source2
        0       1
Segmentation fault      (core dumped)

All chromosomes ran fine, except 22...