mfumagalli / ngsTools

Programs to analyse NGS data for population genetics purposes
GNU General Public License v3.0
170 stars 65 forks source link

Rewriting history #1

Closed rossibarra closed 9 years ago

rossibarra commented 10 years ago

Howdy, and thanks for putting this on github, I'm a fan. Appears you've been rewriting commit history, which causes merge conflicts when I try to git pull to get newest changes to code. See http://git-scm.com/book/ch6-4.html .

Cheers,

Jeff

mfumagalli commented 10 years ago

Thank you Jeff for pointing this out. Indeed we had some issues during the latest setup phase, but everything should be fine now. To fix this issue, the easiest way would be to delete the repository and clone it again from scratch. Sorry for this inconvenience. Best Matteo

rossibarra commented 10 years ago

Yup that's all I did. Easy fix on my end.

Jeffrey Ross-Ibarra

Dept. Plant Sciences 262A Robbins Hall Mail Stop 4 University of California One Shields Ave. Davis, CA 95616

tel: 530-752-1152 web: www.rilab.org


sent from a mobile device

On Jan 17, 2014, at 2:39 AM, Matteo Fumagalli notifications@github.com wrote:

Thank you Jeff for pointing this out. Indeed we had some issues during the latest setup phase, but everything should be fine now. To fix this issue, the easiest way would be to delete the repository and clone it again from scratch. Sorry for this inconvenience. Best Matteo

— Reply to this email directly or view it on GitHub.

rossibarra commented 10 years ago

Hi Matteo,

I’m having trouble running ngsF and hoping you can be of some help. I run angsd I think succesfully, and get .arg .glf and .mafs output files here https://www.dropbox.com/sh/4k8su37s6mene88/xMxTU0zelA . When I run ngsF it tells me I need to give it n_sites, but it’s not at all clear to me what that should be. I try giving it the number of unique positions in the .mafs file (using: "cut -f 2 BKNtiny.mafs | grep -v positions | sort -n | uniq | wc -l"), or an arbitrary number, and it errors out (ERROR: wrong number of sites or invalid/corrupt file!) every time.

I’m running the angsd version in your repo rather than the newest one from source code.

The commands I run via a bash script are (the bam file list I’m using has a single bamfile for one individual in it):

taxon=$1

/home/jri/src/ngsTools/angsd/angsd -bam data/BKNtiny_list.txt -minMapQ 40 -minQ 20 -baq 1 -C 50 -out temp/BKNtiny -GL 2 -doMajorMinor 1 -doGlf 3 -doPost 1 -doMaf 2 -doSNP 1 -minLRT 15.1366

/home/jri/src/ngsTools/ngsF/ngsF -n_ind 1 -glf temp/BKNtiny.glf -out BKNtiny.indF

Thanks for any help,

Jeff

P.S. I also note in the tutorial that, e,g, -doZ doesn’t appear to be a vaild angsd command anymore.

On Jan 17, 2014, at 2:39 AM, Matteo Fumagalli notifications@github.com wrote:

Thank you Jeff for pointing this out. Indeed we had some issues during the latest setup phase, but everything should be fine now. To fix this issue, the easiest way would be to delete the repository and clone it again from scratch. Sorry for this inconvenience. Best Matteo

— Reply to this email directly or view it on GitHub.


Jeffrey Ross-Ibarra

Dept. of Plant Sciences 262 Robbins Hall, Mail Stop 4 University of California One Shields Ave Davis, CA 95616

Web: www.rilab.org Twitter: @jrossibarra Tel: 530-752-1152 Fax: 530-752-4604

fgvieira commented 10 years ago

Hi Jeff,

Indeed "-n_sites" is the number of sites in the glf file. There are two ways to get it, either through the number of sites in the ".mafs" file or the size of the ".glf" file. For the former, you can try running any of these:

cut -f 1,2 BKNtiny.mafs | grep -v position | sort -n | uniq | wc -l
zgrep -cfv position BKNtiny.mafs.gz
echo $((`zcat BKNtiny.mafs.gz | wc -l`-1))

For the latter, the uncompressed "-glf" file should be n_sites_n_ind_3*sizeof(double) bytes.

hope it helps, FGV

PS - your command wouldn't take into account the chromosomes

rossibarra commented 10 years ago

Argh, that was dumb. Thanks! Looks to be running fine. Is there an easy way to run ANGSD directly on inbred or haploid lines without first running ngsF?

Cheers,

Jeff

On Jan 20, 2014, at 2:19 AM, Filipe G. Vieira notifications@github.com wrote:

Hi Jeff,

Indeed "-n_sites" is the number of sites in the glf file. There are two ways to get it, either through the number of sites in the ".mafs" file or the size of the ".glf" file. For the former, you can try running any of these:

cut -f 1,2 BKNtiny.mafs | grep -v position | sort -n | uniq | wc -l zgrep -cfv position BKNtiny.mafs.gz echo $((zcat BKNtiny.mafs.gz | wc -l-1)) For the latter, the uncompressed "-glf" file should be n_sites_n_ind_3*sizeof(double) bytes.

hope it helps, FGV

PS - your command wouldn't take into account the chromosomes

— Reply to this email directly or view it on GitHub.


Jeffrey Ross-Ibarra

Dept. of Plant Sciences 262 Robbins Hall, Mail Stop 4 University of California One Shields Ave Davis, CA 95616

Web: www.rilab.org Twitter: @jrossibarra Tel: 530-752-1152 Fax: 530-752-4604

fgvieira commented 10 years ago

1) If you know your samples are totally inbred you can just set inbreeding to 1 for all individuals on the ".indF" file (one per line) and run ANGSD.

2) As for haploid individuals, not quite sure you can/should use ANGSD since it always assumes diploidy. However, if you want something simple (like genotype calling) you might be able to approximate it by setting inbreeding to 1.

Cheers, FGV

rossibarra commented 10 years ago

Hi Filipe

Thanks. Initial tests on inbred lines gave me a 0 for F, but I may be doing something dumb there and will come back to it. But I ran angsd using the values of 1 and see almost no difference the results compared to running angsd with no inbreeding value. I have made my best effort to figure out steps from the manual. I apologize for continuing to bug here, but if you do have a few minutes to see if I’m doing something boneheaded in my script it would be much appreciated. I’d like to simply look at patterns of Tajima’s D across the genome (I take the window results and then compare genic/non regions across genome), but am getting results that don’t jive with what I get if I just use the straight-up SNP calls. And the SFS looks oddly skewed toward singletons (87% singletons) which is again in contrast to what we see with normal SNP calls. Many thanks for any help you can offer!

Cheers,

Jeff

On Jan 21, 2014, at 1:38 AM, Filipe G. Vieira notifications@github.com wrote:

1) If you know your samples are totally inbred you can just set inbreeding to 1 for all individuals on the ".indF" file (one per line) and run ANGSD.

2) As for haploid individuals, not quite sure you can/should use ANGSD since it always assumes diploidy. However, if you want something simple (like genotype calling) you might be able to approximate it by setting inbreeding to 1.

Cheers, FGV

— Reply to this email directly or view it on GitHub.


Jeffrey Ross-Ibarra

Dept. of Plant Sciences 262 Robbins Hall, Mail Stop 4 University of California One Shields Ave Davis, CA 95616

Web: www.rilab.org Twitter: @jrossibarra Tel: 530-752-1152 Fax: 530-752-4604