Closed Alpha009 closed 3 years ago
@Alpha009 , thanks for bringing this up. I feel like we need the pdf of the 'duration' column before feeding out the JS divergence value. Let me forward this to @caveness.
Sorry for the delay on this. We use the standard histogram and calculate the JSD as shown here:
Please feel free to reopen if more information is needed.
What is taken as input here to find out Jensen shannon Divergence. Is it Probabilities for the pandas column(numerical) or probability density function of the column?
Like in this code--
tfdv.get_feature(schema1, 'duration').drift_comparator.jensen_shannon_divergence.threshold = 0.01
The duration column here is first converted into what? Before feeding to find out the JS divergence value