shenwei356 / taxonkit

A Practical and Efficient NCBI Taxonomy Toolkit, also supports creating NCBI-style taxdump files for custom taxonomies like GTDB/ICTV
https://bioinf.shenwei.me/taxonkit
MIT License
357 stars 29 forks source link

taxonkit lca gives error `[ERRO] bufio.Scanner: token too long' sometimes #75

Closed taylorreiter closed 1 year ago

taylorreiter commented 1 year ago

Prerequisites

Describe your issue

# install taxonkit
conda install taxonkit
# download taxid -> lineage file. required for taxonkit
wget https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz
tar xf taxdump.tar.gz
taxonkit lca --data-dir . -i 2 -s ";" -o test_lca.txt test.txt

test.txt: A TSV-formatted input file

shenwei356 commented 1 year ago

It's due to some long rows, similar to https://github.com/shenwei356/seqkit/issues/214.

shenwei356 commented 1 year ago

Hi Taylor, thanks for reporting this. I've just increased the default size of the line buffer from 4096 to 1M. It works now.

If the error still occurs, you can use a bigger buffer size:

  -b, --buffer-size string   size of line buffer, supported unit: K, M, G. You need to increase the
                             value when "bufio.Scanner: token too long" error occured (default "1M")
taylorreiter commented 1 year ago

thank you so much @shenwei356! I've tried this on both Mac and Linux and it solved the problem. Thank you!!!!!

shenwei356 commented 1 year ago

Please don't hesitate to let me know if you encounter any additional problems.