Open havardox opened 2 months ago
Currently Zingg does not distinguish between datasets while selecting pairs for labelling. However, if you run findTrainingData, label and then run link, you should be able to get the results you want. You can also force feed some trainingSamples between the two datasets using pre existing training data.
@sania-16 can you start looking at this?
I have two datasets: a "corpus" and a "query" database. I need to do active labeling only between those two datasets as the values themselves are already distinct for each dataset. Is that possible? Here's my current code:
Running
zingg.sh {zingg.conf} --run {python_file} label
only selects samples from the "corpus" as the corpus has about 100k records and the query dataset 9k. That's not what I want, I only care about the differences between the query and corpus database.