Open saggu opened 3 years ago
Running the command on the smallest file (1106 rows) takes about 8 seconds to run.
time tl context-match 1438042989018_40_20150728002309-00067-ip-10-236-191-2_57714692_2.csv --context-file 1438042989018_40_20150728002309-00067-ip-10-236-191-2_57714692_2_context.tsv -o context_score > context_test.csv real 0m10.968s user 0m8.024s sys 0m0.972s
and running it on the largest file(58446 rows) ran for more than 7 minutes and didn't finish ( I had to interrupt it)
time tl context-match 88523363_0_8180214313099580515.csv --context-file 88523363_0_8180214313099580515_context.tsv -o context_score > context_test.csv
I changed the way context file is read and used, hoping this would fix it but didnt help much.
Most of the time is spent in computing the score and not in I/O. This needs to be optimized before it can be used.
Please take a look, we can discuss in a meeting if required.
All the files required are attached.
data.zip
Replaced the iterrows() with zip() to iterate over the dataframe in order to reduce time. Largest File (58446 rows) took 3 minutes 35 seconds to complete.
Running the command on the smallest file (1106 rows) takes about 8 seconds to run.
and running it on the largest file(58446 rows) ran for more than 7 minutes and didn't finish ( I had to interrupt it)
I changed the way context file is read and used, hoping this would fix it but didnt help much.
Most of the time is spent in computing the score and not in I/O. This needs to be optimized before it can be used.
Please take a look, we can discuss in a meeting if required.
All the files required are attached.
data.zip