Closed Unoqualsiasi closed 7 years ago
grep '2005' FinalReport_54kV1_ed1.txt | wc
will also grep other lines with 2005 in them, eg:
tikn@login-0:~/for_folk/geno/geno_imputation/genotype_rawdata/illumina54k_v1$ grep '2005' FinalReport_54kV1_ed1.txt | tail -5
Hapmap52005-BTA-75510 5409 A G 0.9098
Hapmap55117-rs29020058 5409 G G 0.9003
ARS-BFGL-NGS-42005 5409 C C 0.8761
BTA-120182-no-rs 5409 G G 0.2005
Hapmap43172-BTA-120051 5409 G G 0.8855
Use:
awk '$2==2005 {print $0}' FinalReport_54kV1_ed1.txt | wc -l
54001
Awk is is safer, since it operates on columns.
If you want to use grep, use :
grep '\s2005\s' FinalReport_54kV1_ed1.txt | cut -f 1 > FinalReport_54kV1_ed1_markerlist.txt
This way you only match 2005 with space around it..
oh fk the boundaries...you are right XD
i was using awk approach the first time i don't know why now i am using grep. I think you should update the script prepare_plink_map_example.Rmd with awk option.
just a small fix :
awk '$2 == 2005 {print $1}' OFS='\t' FinalReport_54kV1_ed1.txt > output
It appears that this file contains 73628 SNPs instead of 54001 as reported in the header of the file -.-