zorino / kaamer

kaamer - protein identification based on amino acid kmers
Apache License 2.0
11 stars 3 forks source link

Short sequences not found #21

Open andreas-wilm opened 1 year ago

andreas-wilm commented 1 year ago

Hi @zorino,

Thanks for creating KAAmer and making it public!

I was planning to use KAAmer to search for very small peptides (down to 8 AA), but can't get it to find control peptides smaller than 13AA.

I use the following example data:

>Type1Collagen-1-20
MFSFVDLRLLLLLAATALLT
>Type1Collagen-1-13
MFSFVDLRLLLLL
>Type1Collagen-1-12
MFSFVDLRLLLL

and search against the human proteome (https://www.uniprot.org/proteomes/UP000005640).

Type1Collagen-1-20 and Type1Collagen-1-13 are found with full length, 100% matches. Type1Collagen-1-12 (and any smaller sequences) are not found however. My understanding was that KAAmer should be able to find any peptide with length of at least 7, (because that's your fixed kmer size) as long as I set mink to 1, which I did. Am I doing this wrongly?

Many thanks, Andreas