lh3 / seqtk

Toolkit for processing sequences in FASTA/Q formats
MIT License
1.39k stars 307 forks source link

output file contains only one amino acid #200

Open Jiayi-Zheng opened 1 year ago

Jiayi-Zheng commented 1 year ago

Hi, I was using command seqtk subseq uniprot_sprot.fasta name.txt > out.fa my name.txt looks like:

sp|Q9H2Y7|ZN106_HUMAN Zinc finger protein 106 OS=Homo sapiens OX=9606 GN=ZNF106 PE=1 SV=1
sp|Q9Y5V0|ZN706_HUMAN Zinc finger protein 706 OS=Homo sapiens OX=9606 GN=ZNF706 PE=1 SV=1
sp|Q15942|ZYX_HUMAN Zyxin OS=Homo sapiens OX=9606 GN=ZYX PE=1 SV=1

I got a file: `$ head -5 out.fa

sp|P31946|1433B_HUMAN:14-14 14-3-3 protein beta/alpha OS=Homo sapiens OX=9606 GN=YWHAB PE=1 SV=3 L sp|Q04917|1433F_HUMAN:14-14 14-3-3 protein eta OS=Homo sapiens OX=9606 GN=YWHAH PE=1 SV=4 A`

`$ tail -4 out.fa

sp|Q9Y5V0|ZN706_HUMAN Zinc finger protein 706 OS=Homo sapiens OX=9606 GN=ZNF706 PE=1 SV=1 MARGQQKIQSQQKNAKKQAGQKKKQGHDQKAAAKAALIYTCTVCRTQMPDPKTFKQHFESKHPKTPLPPELADVQA sp|Q15942|ZYX_HUMAN Zyxin OS=Homo sapiens OX=9606 GN=ZYX PE=1 SV=1 MAAPRPSPAISVSVSAPAFYAPQKKFGPVVAPKPKVNPFRPGDSEPPPAPGAQRAQMGRVGEIPPPPPEDFPLPPPPLAGDGDDAEGALGGAFPPPPPPIEESFPPAPLEEEIFPSPPPPPEEEGGPEAPIPPPPQPREKVSSIDLEIDSLSSLLDDMTKNDPFKARVSSGYVPPPVATPFSSKSSTKPAAGGTAPLPPWKSPSSSQPLPQVPAPAQSQTQFHVQPQPQPKPQVQLHVQSQTQPVSLANTQPRGPPASSPAPAPKFSPVTPKFTPVASKFSPGAPGGSGSQPNQKLGHPEALSAGTGSPQPPSFTYAQQREKPRVQEKQHPVPPPAQNQNQVRSPGAPGPLTLKEVEELEQLTQQLMQDMEHPQRQNVAVNELCGRCHQPLARAQPAVRALGQLFHIACFTCHQCAQQLQGQQFYSLEGAPYCEGCYTDTLEKCNTCGEPITDRMLRATGKAYHPHCFTCVVCARPLEGTSFIVDQANRPHCVPDYHKQYAPRCSVCSEPIMPEPGRDETVRVVALDKNFHMKCYKCEDCGKPLSIEADDNGCFPLDGHVLCRKCHTARAQT`

So the last parts looks normal but the starting ones are obviously lacking things, not sure why is that?

I tried locating the protein in raw file and it looks fine: `$ grep -A 5 'GN=YWHAB' uniprot_sprot.fasta

sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens OX=9606 GN=YWHAB PE=1 SV=3 MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSS WRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFY LKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFY YEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGD AGEGEN`

hanzerruid commented 8 months ago

You can refer to this #180