rahulissar / ai-supply-chain

Repository for common AI use cases in supply chain, procurement
20 stars 6 forks source link

How to run data_preprocessing.py program? #1

Open iotnikhil opened 3 years ago

iotnikhil commented 3 years ago

This is really an interesting project idea but I am not able to preprocess the vendor data using data_preprocessing.py file. Could you tell me how can I use it? help will be appreciated. thanks in advance :)

rahulissar commented 3 years ago

Hi,

Yes some tweaks are needed to get this to work. Could you share a snippet of the error !

Best regards, Rahul Issar

Sent from my iPhone

On 22-Jul-2021, at 10:31 AM, iotnikhil @.***> wrote:

 This is really an interesting project idea but I am not able to preprocess the vendor data using data_preprocessing.py file. Could you tell me how can I use it? help will be appreciated. thanks in advance :)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

iotnikhil commented 3 years ago

Hi @rahulissar,

Thanks for the response, there is no error as such. I was just trying to run

python data_preprocessing.py python data_preprocessing.py vendor_data.csv

as per I can see in the data_preprocessing.py file there is no function for reading the csv files . i tried calling read_data(URL) function by read_data('vendor_data.csv") , still no luck.

the program terminates immediately.

rahulissar commented 3 years ago

Oh I see what’s happening.

You need to run the vendor name norm.py file. That’s the main app.

Best regards, Rahul Issar

Sent from my iPhone

On 22-Jul-2021, at 12:15 PM, iotnikhil @.***> wrote:

 Hi @rahulissar,

Thanks for the response, there is no error as such. I was just trying to run

python data_preprocessing.py python data_preprocessing.py vendor_data.csv

as per I can see in the data_preprocessing.py file there is no function for reading the csv files . i tried calling read_data(URL) function by read_data('vendor_data.csv") , still no luck.

the program terminates immediately.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

iotnikhil commented 3 years ago

Hi ,

I ran vendor_name_norm.py I am getting the following error

Approx. time to generate similarity matrix is 20 mins for 3000+ records Matrix Generation started at : 1626945279.9671965 Runtime of the program is 1.0068180561065674 Approx. time to cluster vendor data is 10 mins for 3000+ records C:\Users\admin\anaconda3\envs\name_standard\lib\site-packages\sklearn\cluster_affinity_propagation.py:136: UserWarning: All samples have mutually equal similarities. Returning arbitrary cluster center(s). warnings.warn("All samples have mutually equal similarities. " Runtime of the program is 1.0160472393035889 Traceback (most recent call last): File "Vendor_Name_Norm.py", line 59, in run_program(custom=True) File "Vendor_Name_Norm.py", line 56, in run_program vendor_clustering(df) File "Vendor_Name_Norm.py", line 36, in vendor_clustering final=standard_name(clustered_vendor) File "D:\ai-supply-chain\Feature_engg.py", line 106, in standard_name df_clusters['standard_name_withoutSpaces'] = df_clusters.StandardName.apply(lambda x: x.replace(" ","")) File "C:\Users\admin\anaconda3\envs\name_standard\lib\site-packages\pandas\core\series.py", line 4356, in apply return SeriesApply(self, func, convert_dtype, args, kwargs).apply() File "C:\Users\admin\anaconda3\envs\name_standard\lib\site-packages\pandas\core\apply.py", line 1036, in apply return self.apply_standard() File "C:\Users\admin\anaconda3\envs\name_standard\lib\site-packages\pandas\core\apply.py", line 1095, in apply_standard convert=self.convert_dtype, File "pandas_libs\lib.pyx", line 2859, in pandas._libs.lib.map_infer File "D:\ai-supply-chain\Feature_engg.py", line 106, in df_clusters['standard_name_withoutSpaces'] = df_clusters.StandardName.apply(lambda x: x.replace(" ","")) AttributeError: 'NoneType' object has no attribute 'replace'

iotnikhil commented 3 years ago

Hi, I am still not able to figure out this error.

rahulissar commented 3 years ago

Hi,

Looks like the clustering isn’t working for you probably because of the sample you have selected. There may not be an output hence none type error.

Would advise to break down the code and try running it in a script format rather than you running a modular program to help yourself in debugging.

Best regards, Rahul Issar

Sent from my iPhone

On 26-Jul-2021, at 10:43 AM, iotnikhil @.***> wrote:

 Hi, I am still not able to figure out this error.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.