ludwig-ai / ludwig

Low-code framework for building custom LLMs, neural networks, and other AI models
http://ludwig.ai
Apache License 2.0
11k stars 1.18k forks source link

Output sequence tagger has wrong probabilities output #1241

Open carlogrisetti opened 2 years ago

carlogrisetti commented 2 years ago

Describe the bug In Ludwig 0.1.x and 0.2.x when having an output feature using a tagger (like in the https://ludwig-ai.github.io/ludwig-docs/examples/#natural-language-understanding example), all the various slots were correctly tagged, and for each slot you were presented with an array of confidence values, representing each possible slot tag confidence. In 0.4.0 (and also happens in 0.3.x) you are presented with a fixed array of confidences, with length max_sequence_length, with one value per slot.

I would expect having multiple arrays of confidences for each output slot as in previous versions, or, if this is an intended change, I would expect this array length to match the output tags length (in the case you want to only output the maximum confidence value for each slot, which I don't think is the case).

To Reproduce Train a model like presented in https://ludwig-ai.github.io/ludwig-docs/examples/#natural-language-understanding Run inference on an utterance Observe the prediction field slots_predictions and note its length Observe the prediction field slots_probabilities and note its length (always set to the maximum possible output value, and with a strange default value for non used tags, without all the sub-arrays representing each slot confidence)

Python 3.6 and 3.9 Ludwig 0.3.x, 0.4.0, git+master current

please note: Ludwig 0.3.3 has a different slots_probabilities output format. In 0.4.0 strange "\n"s were present in the array (like it was a text being parsed), while in 0.3.3 it is correctly parsed as an array.

One more thing: slots_probability which should be the overall confidence of all the slots being predicted, is often negative-valued (which should never happen).

w4nderlust commented 2 years ago

The reasoning for removing all the probabilities was that the output tensor would be very big (imagine batch x 256 words x 1000 classes) and in some cases that led to issues. So here i see two things we may want to do:

  1. first of all make sure that the output probabilities reproting the highest probability have the same length of the input (that should be a straightforward fix)
  2. maybe add an option for obtaining the full probabilties
carlogrisetti commented 2 years ago

In my use case I would only need the probability associated with the chosen tag, but I understand the dilemma.

Could a generator be of any use? It should not instantiate objects unless an actual read is performed (I think this is the perfect use case but I wouldn't know how to integrate it)

In any case there's also something else behind that thing... In the probabilities array I now see escaped newline characters (\n), such as it was somehow converted/formatted to a string. Don't know if it is interesting or not... Thanks for the reply!

carlogrisetti commented 2 years ago

It took a little bit to get around to seeing this, but here I am :) This is the slots_probabilities array I get when parsing the utterance it is a nice day: [0.99996495, 0.9998147 , 0.99985003, 0.9999442 , 0.99750584,\n 0.38254768, 0.38254768, 0.38254768, 0.38254768, 0.38254768,\n 0.38254768, 0.38254768, 0.38254768, 0.38254768, 0.38254768,\n 0.38254768, 0.38254768, 0.38254768, 0.382547] I expected to have an array that displayed only 5 values, one for each slot, outputting each slot probability.

Also, this is the slots_probability for the same utterance: -0.0029234159737825394 I for sure did not expect a negative value here.

I'll look some more into this since it's now a thing I need to quickly fix and use, and hopefully I'll find the culprit :) Any help is much appreciated!

carlogrisetti commented 2 years ago

... and I seem to have tracked the change happening in a commit that @tgaddair did a year ago here: https://github.com/ludwig-ai/ludwig/commit/651db586204097a595df2bbc089d50fa58278288#diff-7e8779090e112c98d6b3730e70529a77a4bbc44d84ce469c2c066f7120df0752

Did not verify yet that prior to this it is not happening, but seeing the code it seems that's exactly the place where it changed from being 3-dimensional to 2-dimensional, but something went wrong in the aggregation of values :)

Might be something wrong with this function: https://github.com/ludwig-ai/ludwig/blob/tf-legacy/ludwig/features/sequence_feature.py#L470-L483

w4nderlust commented 2 years ago

It took a little bit to get around to seeing this, but here I am :) This is the slots_probabilities array I get when parsing the utterance it is a nice day: [0.99996495, 0.9998147 , 0.99985003, 0.9999442 , 0.99750584,\n 0.38254768, 0.38254768, 0.38254768, 0.38254768, 0.38254768,\n 0.38254768, 0.38254768, 0.38254768, 0.38254768, 0.38254768,\n 0.38254768, 0.38254768, 0.38254768, 0.382547] I expected to have an array that displayed only 5 values, one for each slot, outputting each slot probability.

Also, this is the slots_probability for the same utterance: -0.0029234159737825394 I for sure did not expect a negative value here.

I'll look some more into this since it's now a thing I need to quickly fix and use, and hopefully I'll find the culprit :) Any help is much appreciated!

Hey @carlogrisetti ! I believe here the issue is with the naming scheme we are adopting and some implicit things not made explicit. @justinxzhao tagging you here FYI because you worked on this recently and so that we could decide to both do something about it in the code directly and / or document this better.

Regarding the slots_probabilities, my best intuition right now is telling me that the length of that vector is the fixed length of the slots in the training set, or the max length within the batch or the specified max_sequence_length preprocessing parameter. What this means in practice is that only the first k probabilities are relevant, the other slots are PADs that get trimmed. In your example the first 5 .90+ probabilities are associated to each othe words, the rest should be ignored. I believe we should be returning also the LENGTHS, so one should be able to use those to identify the length of the probabilties for each vector, like slot_probabilities[:slot_length].

The question for us is if we need to fix this and trim to the right length. @justinxzhao wdyt?

Regarding slots_probability being negative, what is reported there is actually the logprobability of the entire sequence, so it can be negative. You could obtain a regular probability by exponentiating it, but the probability of entire sequences is usually very very low (because of a big vocabulary and many options for each element of the sequence), so logprobability is usually preferred as it can be used to rank multiple possible sequences and doesn't suffer potential underflow issues that the regualr probability woudl suffer.

here we need probably to: 1) verify that the log probability is calculated considering only the first LENGTH tokens of the sequence, and 2) make clear in the docs the fact that it's log probability.

... and I seem to have tracked the change happening in a commit that @tgaddair did a year ago here: 651db58#diff-7e8779090e112c98d6b3730e70529a77a4bbc44d84ce469c2c066f7120df0752

Did not verify yet that prior to this it is not happening, but seeing the code it seems that's exactly the place where it changed from being 3-dimensional to 2-dimensional, but something went wrong in the aggregation of values :)

Might be something wrong with this function: https://github.com/ludwig-ai/ludwig/blob/tf-legacy/ludwig/features/sequence_feature.py#L470-L483

Returning the 3d probabilties (all the probabilities of all the possible elements of the sequence) could be problematic in many cases, for isntance if one has a generator and a vocab of 10000, is it would return batch_size x sequence length x 10000. But, for most tagging usecases (as opposed to generation ones) the vocabulary is not that large. So we could do a couple things: 1) return 3d probabilties additionally to the other ones for tegger decoders. We should probably find a better naming, something like each_token_probabilties, best_sequence_rpobabilties, sequence_logprobability maybe? With a flag users can turn off. 2) adding an optional flag for returning 3d probabilties also for the generator (default would be false, but the user may turn it on)

Would that work for you?

@tgaddair @justinxzhao what do you think? Also @jimthompson5802 had previous experience on this so may have opinions

justinxzhao commented 2 years ago

Hi @carlogrisetti and @w4nderlust ,

@geoffreyangus may have run into a related (perhaps identical) issue on Ludwig 0.5.dev, and Geoffrey has landed a fix for computing sequence probabilities on master (#1904).

IIRC, the original code (from tf-legacy) is buggy because item assignment is different between lists and np.arrays, and the probability calculation isn't done correctly if the input is an np.array.

@carlogrisetti could you try upgrading to Ludwig 0.5 and seeing if the bad/odd behavior has been addressed?

carlogrisetti commented 2 years ago

Hah... I can't really do that in the current implementation of the project since it's something close to being released in production, and when checking for the last compatible version this issue had risen.

I can try to use 0.5 on a different test environment to confirm it has been solved, but will need to backport to tf-legacy in some way.

Thanks!

dalianaliu commented 1 year ago

Hi @carlogrisetti, has the update to 0.5 worked for this issue?