shenwei356 / csvtk

A cross-platform, efficient and practical CSV/TSV toolkit in Golang
http://bioinf.shenwei.me/csvtk
MIT License
992 stars 84 forks source link

filter2 command is too slow #269

Open y9c opened 4 months ago

y9c commented 4 months ago

Compared with filter command or awk, fiter2 command is much slower, especially for rule with multiple conditions.

It might be relative to this function in the for-loop, which repeatedly parsing the expression. https://github.com/shenwei356/csvtk/blob/9407f73e2d72dddf5042c7dbb6299a180ea9cf4a/csvtk/cmd/filter2.go#L370-L376

shenwei356 commented 4 months ago

Yes, I noticed that. It is slow :(

y9c commented 4 months ago

Can we move the Expression parsing function outside the for-loop and run it only once?

shenwei356 commented 3 months ago

It is slow, but it must be done like that. Cause filterStr1 is different in each iteration.

y9c commented 3 months ago

Why filterStr1 is different? Can we cache the parsed results?

shenwei356 commented 3 months ago

It's the expression, like '$age > 18', the $age needs to be replaced with the value of each row.

y9c commented 3 months ago

Yes. I mean can we parsed the expression as something like '$1>18' and reuse the code of the filter command to deal with the computation afterward

shenwei356 commented 2 months ago

parsed the expression as something like '$1>18' and reuse the code of the filter command I don't think so.

God, it's really slow~ I used it a lot recently. Have to improve it, when I have time ~