jeyson1020 / google-refine

Automatically exported from code.google.com/p/google-refine
0 stars 0 forks source link

Speed improvement #274

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1.Load an excel file with 40,000 records
2.The performance is terrible. 
3.Do a clustering and merging
4.The performance is even worse

What is the expected output? What do you see instead?

What version of the product are you using? On what operating system?
2.0, Win7

Please provide any additional information below.

Original issue reported on code.google.com by naiemk@gmail.com on 7 Dec 2010 at 1:01

GoogleCodeExporter commented 8 years ago
Thanks for the report.  Could you add a little quantitative information please? 
 Number of seconds?  Hardware configuration?  What program are you comparing 
against and how long does it take to perform the same operations?

As always, the source data file would be a useful addition, but we'll 
understand if it's too proprietary to release.

Original comment by tfmorris on 7 Dec 2010 at 4:12

GoogleCodeExporter commented 8 years ago
I am using the clustering feature for Suburbs to validate against Australian 
post database:
- Aussie post codes db has ~16000 records: PostCode, Suburb
- All suburbs in clean data are uppercase.
I put the data at the end of my list of 40,000 addresses and selected clutering.

The clustering window includes a large number of clusters (a few tousands). 
- Client responds really poorly to any click on a checkbox or scroll, etc.

Comparision with other apps:
- Any native (win, mac, etc) grid like UI can easily handle this number of 
records without poor performance.

Original comment by naiemk@gmail.com on 7 Dec 2010 at 4:21

GoogleCodeExporter commented 8 years ago
UI responsiveness in this case depends almost entirely on your browser. Which 
browser are you using? I would recommend the latest Firefox, Safari, or Chrome.

Original comment by dfhu...@gmail.com on 7 Dec 2010 at 6:33

GoogleCodeExporter commented 8 years ago
I am using chrome and a fairly powerfull PC.
I have attached the file I am using.

Original comment by naiemk@gmail.com on 7 Dec 2010 at 6:40

Attachments:

GoogleCodeExporter commented 8 years ago
Issue 241 has a suggestion that we don't quickly APPLY clustering until the 
user clicks on an APPLY button.  That feature may help here as well ?

Original comment by thadguidry on 7 Dec 2010 at 2:42

GoogleCodeExporter commented 8 years ago
I was able to load the file after increasing the memory limit to 2G. But the 
data is all X's so I can't test the facets.

Original comment by dfhu...@gmail.com on 8 Dec 2010 at 5:14

GoogleCodeExporter commented 8 years ago
I actually had performance problem when clustering the Suburb column which is 
not XXXX

Original comment by naiemk@gmail.com on 8 Dec 2010 at 5:45

GoogleCodeExporter commented 8 years ago
The largest number of clusters that I see using any of the methods with their 
default parameters is about 1800 and I don't see any performance issue.  My 
best guess is that there's some type of configuration issue or perhaps your 
system needs more memory.

Original comment by tfmorris on 18 Sep 2012 at 7:45