malthesr / winsfs

Site frequency spectrum estimation based on window expectation-maximisation algorithm
MIT License
13 stars 0 forks source link

"Problem with size of dimension" when winsfs SFS is used with ANGSD saf2theta #7

Closed TeresaPegan closed 1 year ago

TeresaPegan commented 1 year ago

Hello! I am excited to upgrade to winsfs in my genotype likelihood analyses.

I am trying to use ANGSD to calculate theta statistics and I'm running into an error when I try to do it with winsfs. Apparently winsfs creates an SFS that has 1 fewer dimensions than realSFS does, and this trips up ANGSD. Does anyone know where this discrepancy comes from? Is there a way to edit the winsfs output so that it's in the form ANGSD expects?

For now I will have to stick with realSFS, but please let me know if there is a way to get around this and use winsfs!

Thanks, -Teresa

Code with winsfs:

$ANGSDIR/angsd -b $BAMS -out $SAF \
-anc flinflon/$REF \
-minMapQ 30 -minQ 30  \
-dosaf 1 -GL 2 -sites $SITES

winsfs $SAF.saf.idx > $SFS

$ANGSDIR/misc/realSFS saf2theta $SAF.saf.idx -outname $THETAOUT -sfs $SFS -fold 1

Error message:

[persaf::persaf_init] Version of flinflon/BWABHVI/ANGSD/Vphil_subsamp2k.saf.idx is 3
[persaf::persaf_init] Assuming .saf.gz file is flinflon/BWABHVI/ANGSD/Vphil_subsamp2k.saf.gz
[persaf::persaf_init] Assuming .saf.pos.gz file is flinflon/BWABHVI/ANGSD/Vphil_subsamp2k.saf.pos.gz
    -> args: tole:0.000000 nthreads:4 maxiter:100 nsites(block):0 start:flinflon/BWABHVI/ANGSD/Vphil_subsamp2k.1dsfs chr:(null) start:-1 stop:-1 fstout:flinflon/BWABHVI/ANGSD/Vphil_subsamp2k oldout:0 seed:-1 bootstrap:0 resample_chr:0 whichFst:0 fold:1 ref:(null) anc:(null)
    -> Will read chunks of size: 4096
    -> Reading: flinflon/BWABHVI/ANGSD/Vphil_subsamp2k.1dsfs assuming counts (will normalize to probs internally)
    -> Pxroblem with size of dimension of prior 31 vs 32

The error remains even if I fold the winsfs SFS first using winsfs view -f.

I do not get this problem (or should I say pxroblem) when I use realSFS to make the SFS like this. With the realSFS SFS, saf2theta successfully creates a thetas file.

ANGSDIR/misc/realSFS $SAF.saf.idx -fold 1 > $SFS.realsfs
malthesr commented 1 year ago

Hi Teresa,

I'm not too familiar with the saf2theta stuff in realSFS. However, the winsfs output has two lines, the second of which should be identical to the output format of realSFS. Does it solve your problem if you copy the winsfs output to a new file, delete the first line, and use that as input?

Best, Malthe

TeresaPegan commented 1 year ago

Thanks for the lightning-fast response!

I tried this out and found that it seems like I need to both remove the little header from winsfs and fold the SFS before using it with saf2theta (which i am using with fold -1 option). If I do just one or the other, I still get the dimension error. But it looks like I can have it working now, thanks!