Closed dbogunowicz closed 9 months ago
The changes work with the minimal example reported. Quick question, though, is it expected that if the consumer uses a non-default sequence_length that they should also be manually defining the doc_stride? I ask because now if I e.g. set a sequence_length of 32 in the constructor with no other kwargs, it will fail the new check in the PR as the doc_stride simply defaults to 62
regardless.
@dbarbuzzi I understand your concerns. Let me take a look at what are HF default settings.
@dbarbuzzi https://github.com/huggingface/transformers/blob/main/src/transformers/pipelines/question_answering.py#L408 It's pretty explicit here.
It's pretty explicit here.
Yes, however, if you look a couple of lines up at 405-406, it will automatically determine an appropriate value for doc_stride
if it is left as None
(the default). We don't have that same logic, nor opportunity, because we are defaulting it to a predetermined value (62
) instead of None
.
Either way, this is less about the bug and more about if the UX should be improved since we have a predetermined hard-coded default value for doc_stride
and that's where we’re differing; transformers
does not have a default value and, if not set, will automatically determine an appropriate one.
@dbarbuzzi let's set doc_stride to None by default and compute it dynamically to avoid the issue. I think this is fine.
Fix for: https://app.asana.com/0/1205724081986514/1205993362080974 (the exhaustive explanation for this fix is in the ticket) @dbarbuzzi this fix works for me, please confirm whether it also solves your issue) Will cherry-pick once approved.