tensorflow / tflite-support

TFLite Support is a toolkit that helps users to develop ML and deploy TFLite models onto mobile / ioT devices.
Apache License 2.0
378 stars 128 forks source link

Fix whitespace typo for Bert Question Answerer test in README.md #967

Open ghostfly23333 opened 8 months ago

ghostfly23333 commented 8 months ago

In the Bert QA testcase provided by README.md, I think the context is seperated by different types of whitespace. I also print the results of the subwords of tokens. It shows these.

context: The Amazon rainforest, alternatively, the Amazon Jungle, also known in English as Amazonia, is a moist broadleaf tropical rainforest in the Amazon biome that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 km2 (2,700,000 sq mi), of which 5,500,000 km2 (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations.

orig_tokens: {The Amazon}, {rainforest,}, {alternatively,}, {the Amazon}, {Jungle,}, {also}, {known}, {in}, {English}, {as Amazonia,}, {is}, {a moist}, {broadleaf tropical rainforest in}, {the Amazon}, {biome that}, {covers}, {most}, {of}, {the Amazon}, {basin of}, {South}, {America.}, {This}, {basin}, {encompasses}, {7,000,000 km2 (2,700,000 sq mi),}, {of}, {which}, {5,500,000 km2 (2,100,000 sq mi)}, {are}, {covered}, {by}, {the}, {rainforest.}, {This}, {region}, {includes}, {territory}, {belonging}, {to}, {nine}, {nations.}

So the tokens are not correctly been seperated by the absl::StrSplit() in https://github.com/tensorflow/tflite-support/blob/2ab77502e1f2937923ef105547c1196a1e81a1c4/tensorflow_lite_support/cc/task/text/bert_question_answerer.cc#L205

I check the hex values of the origin context string it shows typos about whitespace in the context and I patch the issue.

ghostfly23333 commented 8 months ago

Hi @lu-wang-g

Asking here for code review as I'm not allowed to assign you as reviewer in the reviewers sidebar.

Also if the example is ought to be the origin version, feel free to comment below! Have a good day! ghostfly23333