voichek / kmersGWAS

A library for running k-mers based GWAS
GNU General Public License v3.0
106 stars 24 forks source link

kinship matrix is half the expected size #157

Closed dosshra closed 1 month ago

dosshra commented 1 month ago

Hello I am running these commands: ~/kmergwas/bin/list_kmers_found_in_multiple_samples -l kmers_w_strand_path -k 31 --mac 5 -p 0.2 -o kmers_to_use ~/kmergwas/bin/build_kmers_table -l kmers_w_strand_path -k 31 -a kmers_to_use -o kmer4wild_dom ~/kmergwas/bin/emma_kinship_kmers -t kmer4wild_dom -k 31 --maf 0.05 > kmers_table.kinship I get these lines on the tail of build_kmers_table log file

5000 / 5000 : 0100000000000000000000000000000000000000000000000000100000110000 Wrote: kmers=28662 pa words=28662 container size=28662 hash-map size=28662 5001 / 5000 : Loading k-mers 5001 / 5000 : 0100000000000011010001101101110001011101011000111001000010010110 Wrote: kmers=0 pa words=0 container size=0 hash-map size=0 close file

And this at the the end of the log of emma_kinship_kmers

.....................#6193797256 The resulting kinship file is 52X52 while the number of samples in the kmers_w_strand_path file is 104. Thank you

voichek commented 1 month ago

Hi,

It seems something is off, especially since the number of rows you're getting is exactly half of what you expect. Could you please share two files with me—kmers_w_strand_path and kmers_to_use.shareness—so I can investigate further and try to understand what's going on?

Best regards, Yoav

dosshra commented 1 month ago

Please see the attached files. I see that the problem begins in list_kmers_found_in_multiple_samples. I took the 52 samples that were not included in the matrix and ran the list_kmers_found_in_multiple_samples agin on the subset and see that the file kmers_to_use.stats.both contain only 27 lines and 27 columns. So it seems that there no problem with the individual kmer files but something splits the list by about half. It seems that the script is skipping every other line. Thank you kmers_to_use.shareness.txt kmers_w_strand_path.txt

voichek commented 1 month ago

As mentioned in the manual, the kmers_w_strand_path.txt file should follow this format: each line should contain the full path to the k-mers list file, followed by a tab and the individual name. It seems that the individual names are missing in your file, so the program is interpreting every second file as the name of the previous one.

voichek commented 1 month ago

Please close the issue if it solved the problem.

dosshra commented 1 month ago

Thank you The issue was resolved

dosshra commented 1 month ago

Thank you The issue was resolved