umb-oconnorgroup / ibdtools

A cpp library/tool set for working with large identity-by-descent files
MIT License
0 stars 0 forks source link

merge function is crashing #13

Open vicbp1 opened 1 month ago

vicbp1 commented 1 month ago

when running the following command line

$ibdtools merge -i ${chr}.sibd -m ${chr}.meta -o ${chr}.mibd -M 10 ibdtools merge options received: --ibd_in: 19.sibd --meta_in: 19.meta --ibd_out: 19.mibd --max_snp: 1 --max_cm: 0.6 --mem: 10

I am getting this error:

Error from ../src/../include/ibdmerger.hpp:168:

I am not sure what it means by this error but when. I increased the memory to 500I received:

terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Aborted (core dumped)

any thought?

bguo068 commented 1 month ago

Thank you for reporting this. I will first check whether the latest commit has already fixed the issue. If not, I will need your files for debugging. Could you share the path to the files used in this command?

Additionally, the merge process uses your VCF file, which should contain only biallelic sites. Could you also provide the path to the VCF file?

vicbp1 commented 1 month ago

Thanks!!! I report it today to not forget it

Here are the paths:

IBD compressed files: /local/chib/toconnor_grp/TOPMed_analyses/IBD_analyses/IBDprocessing HAPIBD results: /local/chib/toconnor_grp/TOPMed_analyses/IBD_analyses/hap-ibd_outputs VCF files: /local/chib/toconnor_grp/TOPMed_analyses/MAF_filtered/freeze.10b.chr${chr}.phased.mac5.vcf.gz Genetic Maps: /local/chib/toconnor_grp/victor/Public_data/INS_LDGH_data/hg38//IBD_analyses/genetic_maps/genmap_chr${chr}_space.txt

I did not see any flag for the vcf file

Thanks!

bguo068 commented 1 month ago

Oh, I meant the VCF file you used for "ibdtools encode: encode the IBD file, VCF file, and PLINK map file into binary format for better/quicker IO."

In the current implementation, ibdtools assumes all sites are phased and biallelic to achieve high memory compaction for storing genotype information. Do you know if the VCF file used in ibdtools encode contains unphased or multiallelic sites? I am checking /local/chib/toconnor_grp/TOPMed_analyses/MAF_filtered/freeze.10b.chr19.phased.mac5.vcf.gz, but it is a bit large and takes some time to finish. Could you confirm that this was the VCF file you used for ibdtools encode?

vicbp1 commented 1 month ago

Oh! :( I included multiallelic since hap-ibd can handle them. So maybe that is the problem; I will run a test to check if this is the problem. By the way, I moved from the sort function to the matrix function, assuming that just the merge was problematic, and I had a similar error.

Thank you so much!

bguo068 commented 1 month ago

Got it. Yes, please try to use only biallelic sites if possible. The human genome has plenty of biallelic sites, which should suffice for IBD calling.

Let me know your thoughts :)

bguo068 commented 1 month ago

matrix function, assuming

Could you also share the error message and the files used for the matrix function?