Open rvanlaar opened 3 years ago
Hi @rvanlaar sorry for the late response.
This is an interesting case, since q is supposed to throw an error if there's an issue with parsing/reading the data.
I wouldn't ask you to send actual data to me, but it would be great if you could narrow it down to specific sets of lines, so we can understand the issue better.
My suggestion is to use the following method:
export MYFILE=myfile ; export T=30; export C=10; echo $(tail -$T $MYFILE | head -$C | q "select count(*) from -" -c 1) $(tail -$T $MYFILE | head -$C | wc -l)
. This prints the number of lines by q and by wc for some part of the file. Obviously, T=30 and C=10 are very small relative to your large file. I suggest using some binary search on the number of lines in the file in order to narrow this down as quickly as possible. If you're not familiar with binary search, please ping me here or in harelba@gmail.com and i'll gladly help.
Once you've narrowed it down to be small enough, two options:
Harel
Hi @harelba
I wanted to give you an update. I switched over to using pandas to do the import, which does import all the lines. So, for us it's not an urgent problem any more.
This problem still got my attention. I hope to find some time to follow your advice for the imports.
Roland
First of, q is a great product. It allows me to read a huge file, 5GB, without Out of Memory errors.
I'm missing around 20K lines between what
wc -l
gives me andselect count(*) from file.csv
in 'q'.What gives, and how can I find out which lines are problematic?