Closed permia closed 3 months ago
Hi Shen,
I translated a big nucletide file (~ 2.5G) with Transdecodes, and the translated protein file is also big (1.2 G).
The IDs of translated proteins are similar to ID-A.p1 ID-A.p2 ID-A.p3 etc.
In some case, I want to extract all (or > 1) of the translated sequences of one ID.
Because I have 10000 ID to extract. it's slow to extract these sequences using the following code.
seqkit grep -w 0 --id-regexp "^(\\S+)\\.p\d+\\s?" -f id.txt longest_orfs.fasta -o all_ORF.fasta
And, there is an option --delete-matched, which would make the extraction faster if you only want one hit.
--delete-matched
Is there any option that I can set the match times of one pattern? which would make the extraction faster.
seqkit grep -w 0 --delete-matched --id-regexp "^(\\S+)\\.p\d+\\s?" -f id.txt longest_orfs.fasta -o largest_ORF.fasta
Just run the command below and wait.
seqkit grep -w 0 --id-regexp "^(\S+)\.p\d+" -f id.txt longest_orfs.fasta -o largest_ORF.fasta
Hi Shen,
I translated a big nucletide file (~ 2.5G) with Transdecodes, and the translated protein file is also big (1.2 G).
The IDs of translated proteins are similar to ID-A.p1 ID-A.p2 ID-A.p3 etc.
In some case, I want to extract all (or > 1) of the translated sequences of one ID.
Because I have 10000 ID to extract. it's slow to extract these sequences using the following code.
seqkit grep -w 0 --id-regexp "^(\\S+)\\.p\d+\\s?" -f id.txt longest_orfs.fasta -o all_ORF.fasta
And, there is an option
--delete-matched
, which would make the extraction faster if you only want one hit.Is there any option that I can set the match times of one pattern? which would make the extraction faster.
seqkit grep -w 0 --delete-matched --id-regexp "^(\\S+)\\.p\d+\\s?" -f id.txt longest_orfs.fasta -o largest_ORF.fasta