shenwei356 / csvtk

A cross-platform, efficient and practical CSV/TSV toolkit in Golang
http://bioinf.shenwei.me/csvtk
MIT License
992 stars 84 forks source link

###the output of csvtk grep was empty #270

Closed YoungerRen closed 3 months ago

YoungerRen commented 4 months ago

hi, when i processed my taxonomy data, i wanted to merge two datasets by a specific column match. so i used the csvtk grep command as following : cat NCBI_seqid_taxid.txt | csvtk -t grep -f accession.version -P nr_blt2.txt | csvtk -t cut -f Geneid,accession.version,taxid> nr_tax.txt but the output: nr_tax.txt was empty. this is my NCBI_seqid_taxid.txt image

and this is my nr_blt2.txt image

then this was the error file image

hope ur anwser or any idea about this. thanks

shenwei356 commented 4 months ago

then this was the error file

it's not an error, it's kind of progress and I'll remove this in the next version.

You should use csvtk join for merging tables.

OK, using csvtk grep is fine. But you misuse it. -P accept the file of the pattern list, and each line should only be the accession.version data rather than the whole line.

  -P, --pattern-file string   pattern files (one pattern per line)

Besides, the files' order is inverted. You should filter nr_blt2.txt with the accession.version list from NCBI_seqid_taxid.txt, because the Geneid is in the former file.

Hmm, but you need the taxid in NCBI_seqid_taxid.txt. If these files are small, just use csvtk join, however, it might be big.

Here's the right way.

  1. Duplicate the accession.version column with csvtk mutate.
  2. Replace the new accession.version column data with the corresponding values from the key-value file NCBI_seqid_taxid.txt.
cat nr_blt2.txt  \
    | csvtk cut -t -f Geneid,accession.version  \
    | csvtk mutate -t -n taxid -f accession.version \
    | csvtk replace -t -f taxid -k NCBI_seqid_taxid.txt -p '(.+)' -r '{kv}' -o result.txt
YoungerRen commented 4 months ago

thanks for ur codes, but there was another error accuring on my operation as followed : image

shenwei356 commented 4 months ago

updated.

please try to

  1. read the error message
  2. read the help message by adding "-h"
  3. and fix it by yourself