rahulissar / ai-supply-chain

Repository for common AI use cases in supply chain, procurement
20 stars 6 forks source link

Affinity propagation did not converge #2

Open robertovergallo opened 2 years ago

robertovergallo commented 2 years ago

Hello, great work, thank you for sharing this! I've been testing it with both datasets, and it works. Unfortunately I receive this warning when running Vendor_Name_Norm.py on a custom dataset that I own:

Approx. time to generate similarity matrix is 20 mins for 3000+ records
Matrix Generation started at : 1643178873.104545
Runtime of the program is 143.45929074287415
Approx. time to cluster vendor data is 10 mins for 3000+ records
/usr/local/lib/python3.9/site-packages/sklearn/cluster/_affinity_propagation.py:250: ConvergenceWarning: Affinity propagation did not converge, this model will not have any cluster centers.
  warnings.warn(
Runtime of the program is 13.805680990219116

Clustering seems not working. Here it is a sample output row (sorry I can't paste more rows):

186,186,BURGER KING PIAZZALE A,burger king piazzale a,-1,s,0

Could you help me figuring this out? I'm using Python 3.9.2. The nltk version is 3.6.7. The scikit-learn version is 1.0.2.

Thank you, Roberto

rahulissar commented 2 years ago

Hey Roberto,

Thank you.

Well, what’s the objective ? If you want to understand the technicalities, I’d advise using the California dataset and break down the processing steps.

If you wanna just normalise your vendor data, then just feed in chunks of your vendor master 3K at a time alphabetically, as the custom dataset format and see where it goes.

For the clustering, the only thing I can recommend is checking the settings.py for vendor stop/junk words.

Best regards, Rahul Issar

Sent from my iPhone

On 26-Jan-2022, at 12:21 PM, Roberto @.***> wrote:

 Hello, great work, thank you for sharing this! I've been testing it with both datasets. Unfortunately I receive this warning when running Vendor_Name_Norm.py on Government of California's dataset:

Approx. time to generate similarity matrix is 20 mins for 3000+ records Matrix Generation started at : 1643178873.104545 Runtime of the program is 143.45929074287415 Approx. time to cluster vendor data is 10 mins for 3000+ records /usr/local/lib/python3.9/site-packages/sklearn/cluster/_affinity_propagation.py:250: ConvergenceWarning: Affinity propagation did not converge, this model will not have any cluster centers. warnings.warn( Runtime of the program is 13.805680990219116 Clustering seems not working. Here it is a sample of the output:

,Supplier Code,Supplier Name,Cleaned_Name,Cluster,StandardName,Score_with_standard 0,1743406,Voyager Fleet Systems Inc,voyager fleet systems,-1,s,9 1,1001584,Grainger Industrial Supply,grainger industrial supply,-1,s,7 2,1065902,Prison Industry Authority,prison industry authority,-1,s,8 3,1008361,3B INDUSTRIES INC,b industries,-1,s,15 4,1087660,Technology Integration Group,technology integration group,-1,s,0 5,1755386,"Western Blue, an NWN Company",western blue an nwn,-1,s,10 6,17224,"Smile Business Products, Inc",smile business products,-1,s,8 7,0,Unknown,,-1,s,0 8,12341,"TAGG Industries, Inc.",tagg industries,-1,s,12 9,1752319,McKesson Medical - Surgical Minnesota Su,mckesson medical surgical minnesota su,-1,s,5 10,11527,"San Joaquin Distributors, Inc.",san joaquin distributors,-1,s,8 11,10803,River City Office Supply,river city office supply,-1,s,8 12,1249060,Western Blue/Insight/Hewlett Packard,western blueinsighthewlett packard,-1,s,6 13,15228,THE PRIMARY SOURCE,the primary source,-1,s,11 14,14995,"Horizon Business Solutions, Inc.",horizon business solutions,-1,s,7 15,8329,Merritt Business Supplies,merritt business supplies,-1,s,8 16,19621,Christian Bartels Enterprises Inc. dba CB Enterprises,christian bartels enterprises cb enterprises,-1,s,6 17,1763613,"Bay Medical Co., Inc",bay medical,-1,s,0 18,13274,Western Blue Corporation,western blue,-1,s,15 19,47190,Office Xpress Inc,office xpress,-1,s,14 20,34872,Adolph Inc.,adolph,-1,s,0 It works well on the sample dataset instead. Could you help me figuring this out? I'm using Python 3.9.2. The nltk version is 3.6.7. The scikit-learn version is 1.0.2.

Thank you, Roberto

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you are subscribed to this thread.

robertovergallo commented 2 years ago

Hello Rahul,

thank you for your kind answer.

I finally solved my problem by adding the following parameters to the AffinityPropagation algorithm in Clustering.py :

clusters = AffinityPropagation(affinity='precomputed', damping=0.9, max_iter=2000, convergence_iter=200).fit_predict(matrix)

Regards, Roberto

rahulissar commented 2 years ago

Hey Roberto,

Thanks for letting me know :)

This was new to me 😄

Best regards, Rahul Issar

Sent from my iPhone

On 31-Jan-2022, at 2:51 AM, Roberto @.***> wrote:

 Hello Rahul,

thank you for your kind answer.

I finally solved my problem by adding the following parameters to the AffinityPropagation algorithm in Clustering.py :

clusters = AffinityPropagation(affinity='precomputed', damping=0.9, max_iter=2000, convergence_iter=200).fit_predict(matrix)

Regards, Roberto

— Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you commented.