mfumagalli / ngsTools

Programs to analyse NGS data for population genetics purposes
GNU General Public License v3.0
169 stars 65 forks source link

Possible error reading SFS, binary file might be broken... #37

Open MboiTui opened 4 months ago

MboiTui commented 4 months ago

Hello,

I am trying to use ngsStat to estimate Dxy between my populations.

To achieve this I am following the ngsTools tutorial.

First, I produce saf files with ANGSD using an ancestral outgroup to unfold the spectrum.

Then i use realSFS to find the intersection between the two populations

realSFS print 5_angsd/5_2_${pop1}_unfolded.saf.idx 5_angsd/5_2_${pop2}_unfolded.saf.idx | cut -f 1-2 > 5_angsd/5_7_dxy/${pop1}vs${pop2}_intersect.txt

angsd sites index 5_angsd/5_7_dxy/${pop1}vs${pop2}_intersect.txt

Then I use angsd to produce the saf files for each population only for the overlapping sites

angsd -bam 5_angsd/5_0_popfiles/${pop}.bamlist \
    -ref ${REF} -anc ${ANC} -out 5_angsd/5_7_dxy/5_2_${pop1}vs${pop2}_${pop} \
    -uniqueOnly 1 -remove_bads 1 -only_proper_pairs 1 -trim 0 -C 50 -baq 1 \
    -minMapQ 20 -minQ 20 -minInd 3 -setMinDepth 6 -setMaxDepth ${MAXDEPTH} \
    -GL 1 -doSaf 1 -doCounts 1 -P 6 -sites 5_angsd/5_7_dxy/${pop1}vs${pop2}_intersect.txt

In this example, pop1 has 4 individuals and pop2 has 6. Both diploid.

The number of overlapping sites is 935,523,322

When I try and run ngsStats as follows (after doing zcat ...saf.gz > ...saf

ngsStat -npop 2 -postfiles 5_2_LMvsLNs_LM.saf 5_2_LMvsLNs_LNs.saf -nsites $NSITES -nind 8 12 -outfile test_LMvsLNs.txt

I get the error: Possible error reading SFS, binary file might be broken...

It is unclear what I have to put for nind, so i tried both 8 12 and 4 6.

I saw closed issue #5 in the ngsPopGen repository, but it did not help. I can provide the gzipped saf files if that can help with troubleshooting.

Cheers, LVB

mfumagalli commented 2 months ago

ngsStat is largely surpassed by all new features in ANGSD and I am not sure which statistics you are calculating is not present in ANGSD. To go back to your error message, that typically happens if either the number of sites or the number of individuals is not correct. Have you run the routine to get the nr of overlapping sites? Nind is the number of diploid individuals.