zinggAI / zingg

Scalable identity resolution, entity resolution, data mastering and deduplication using ML
GNU Affero General Public License v3.0
950 stars 120 forks source link

FAQs on zingg #868

Closed prk2331 closed 1 month ago

prk2331 commented 2 months ago

@sonalgoyal I want to ask one more question in this thread. Please guide In the model's folder, there are 7 folders. 100 101 102 103 104 105 106 9999 and in this demo https://www.youtube.com/watch?v=zOabyZxN9b0&t=631s we are using "modelId": 100 Q1. when to use other models, is this all are same? and "labelDataSampleSize" : 0.5 Q2. I want to run the same example on my dataset where I have 3,84,000 rows, what "labelDataSampleSize" need to mention there?

Thanks

vikasgupta78 commented 2 months ago

is the dataset on which you are trying to run having same schema as the example model?

prk2331 commented 2 months ago

@vikasgupta78 Thanks for your reply. No, we need to change the config. json accordingly (data is different) I pushed my CV inside the Docker container, and now I'm trying to run this same example with my data. is there any chance of conflicts (if we run same model on our other data )?

vikasgupta78 commented 2 months ago

If your schema is different the existing models will not work, you will have to train your own model

prk2331 commented 2 months ago

hi @vikasgupta78 is this link is useful to train our own model for our another data set ? https://docs.zingg.ai/zingg0.3.3/stepbystep/createtrainingdata/addowntrainingdata

prk2331 commented 2 months ago

Any medium blog or video link for a better understanding of how to create our own model ?

sania-16 commented 2 months ago

There are links already mentioned above that give a good understanding of the model. What else do you need from us @prk2331 ?