lczech / grenedalf

Toolkit for Population Genetic Statistics from Pool-Sequenced Samples, e.g., in Evolve and Resequence experiments
GNU General Public License v3.0
35 stars 2 forks source link

fst-cathedral : Segmentation Fault #31

Closed plhm closed 1 month ago

plhm commented 2 months ago

Hey there, Lucas.

Thank you for this amazing tool.

I'm running grenedalf v-0.6.0. I've been able to use its 'diversity' and 'fst' functions, but for some reason I am getting a 'Segmentation Fault' error when running the 'fst-cathedral' function. My call is as follows:

$grenedalf_path/grenedalf_v0.6.0 fst-cathedral --method unbiased-nei \ --pool-sizes 40 --vcf-path $wd_path/FILE.vcf.gz \ --reference-genome-fasta $genome_path/GENOME.fa.gz --out-dir $wd_path \ --file-prefix PREFIX

Unfortunately the 'Segmentation Fault' error message isn't particularly helpful. I've attempted stripping the function from all of its unnecessary parameters in hopes of getting it to run, to no luck.

Information on the OS is below:

NAME="CentOS Linux" VERSION="7 (Core)" ID="centos" ID_LIKE="rhel fedora" VERSION_ID="7" PRETTY_NAME="CentOS Linux 7 (Core)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:centos:centos:7"

I am running the function on UVA's HPC.

I was wondering if you've faced a similar issue recently, or if you have a suspicion on why this is happening.

Best,

P

lczech commented 2 months ago

Hi @plhm,

segmentation faults indicate some bug in the code, which I will have to have a closer look at and track down. Could you maybe create a minimal example of your input files that produces the problem, so that I can test it here?

Thanks Lucas

plhm commented 2 months ago

Hey, Lucas.

Sure thing. What is the best way to share the files?

P

lczech commented 2 months ago

Depends on how large they are - if you can get a minimal example to work that zipped is <25MB, you can directly attach it here. Otherwise, some file sharing service, Dropbox or the like, maybe? Or really, whatever is convenient for you :-)

stenglein-lab commented 2 months ago

Hi there,

I also ran into this issue. Here is a .tar.gz file that contains 2 bam files that will cause a segmentation fault when run with this command:

./grenedalf_v0.6.0_linux_x86_64 fst-cathedral --sam-path SRR14250292.bam --sam-path SRR14250293.bam --method unbiased-nei --pool-sizes 48

bam.tar.gz



How I created these bam (doing these steps is not necessary to recreate fault, just for information):

I created these bam files by mapping some reads from 2 SRA experiments to the Drosophila melanogaster reference genome. What I did was:

Then running the above grenedalf command causes the segmentation fault.

Happy to provide more info. Thanks for developing the great tool.

lczech commented 2 months ago

Hi @stenglein-lab,

thanks for the example, that indeed produced a seg fault. The issue was caused by a bug in an error message that should not have been triggered at all, but also even if, should not have contained the bug on top of that... Well, thanks for reporting this, also @plhm - I think that your issue was caused by the exact same bug.

Should be fixed now in grenedalf v0.6.1. Let me know if that works, @plhm and @stenglein-lab.

Also, I ran your example for testing @stenglein-lab, and as you will notice, there is an error reported due to the chromosomes in your bam files not being in the same order, or not all chromosomes being present in both files (I think the latter, as your command says you sorted them). You'll see an error message when you run your command with the new version. The fix is to provide a reference genome fasta/fai/dict file, so that the program knows the expected order even if not all chromosomes are present in all files.

Still, for the first two chromosomes, it works, but gives a very weird plot:

cathedral-plot-SRR14250292 SRR14250293-NC_004354 4

where a lot of the larger window sizes (the upper rows of pixels) have FST near 1 (dark areas), and then suddenly drop to near 0 towards the bottom (yellow spikes) for some smaller windows. I think that might just be due to your minimal example. But if you experience this with your full dataset as well, let me know - that's not how these plots are supposed to look like, and without further investigation, I am not sure what's going on there. Hope I didn't introduce a new bug now.

Cheers and so long Lucas

stenglein-lab commented 2 months ago

Hi Lucas,

Version 0.6.1 ran well for me and I was able to produce cathedral plots. They didn't have the other issue you describe in the above using the minimal example bam files.

Thanks for the quick response and fix!

Mark

lczech commented 2 months ago

Oh okay, would you mind showing how they look for you, and which exact command you used? Just curious to see how they look on other datasets, and it's always useful information for me to see how my tools are being used in practice!

Also, I assume this issue can be closed then? @plhm, does it work for you now as well with the new version?

lczech commented 2 months ago

Hi @plhm and @stenglein-lab,

if the new version 0.6.1 is working for both of you, can we close this issue then? Also, I'd still be curious to see how the cathedral plots that you are getting look now - just to check that things are working as expected on different types of data.

Let me know how things are going :-)

Thanks and cheers Lucas

stenglein-lab commented 1 month ago

Hi @lczech

It seems to be working fine for me and you could close this as far as I'm concerned. Here's an example of a cathedral plot: pool sequencing of 2 D. melanogaster populations, showing chromosome 2R (originally a .bmp file which github doesn't supported; I exported as jpeg).

cathedral-plot-example-2R

Mark

lczech commented 1 month ago

Hi Mark @stenglein-lab,

perfect, thank you very much! Yes, that looks good. Was just wondering, due to the weird plot with your test data.

Going to close this then. @plhm, should you have any further trouble, feel free to re-open this, or start a new issue.

Cheers Lucas