ruixiangcui / AGIEval

MIT License
707 stars 48 forks source link

gaokao-english dirty data #12

Closed yangsp5 closed 1 year ago

yangsp5 commented 1 year ago

The gaokao-english has a dirty data.

The question is

The engineer Camillo Oliver was 40 years old when he started the company in 1908. At his factory in Ivrea, he designed and produced the first Italian typewriter. Today the company's head office s still in Ivrea, near Turin, but the company is much larger than it was in those days and there are offices all around the world.By 1930 there was a staff of 700 and the company turned out 13,000 machines a year. Some went to customers in Italy, but Olivetti exported more typewriters to other countries.Camillo's son, Adriano, started working for the company in 1924 and later he became the boss. He introduced a standard speed for the production line and he employed technology and design specialists. The company developed new and better typewriters and then calculators(计算机). In 1959 it produced the ELEA computer system. This was the first mainframe(主机)computer designed and made in Italy.After Adriano died in 1960, the company had a period of financial problems. Other companies, especially the Japanese, made faster progress in electronic technology than the Italian company. In 1978, Carlo de Benedetti became the new boss. Olivetti increased its marking and service networks and made agreements with other companies to design and produce more advanced office equipment. Soon it became one of the world's leading companies in information technology and communications. There are now five independent companies in the Olivetti group—one for personal computers, one for Systems and services, and two for telecommunications.

The option is:

like:

['(A)It produced the best typewriter in the world.     ', '(B)It designed the world’s firs![](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAXwAAAAkCAMAAAC9k3HWAAADAFBMVEUAAACAAAAAgACAgAAAAICAAIAAgICAgIDAwMD/AAAA/wD//wAAAP//AP8A//////8AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADMAAGYAAJkAAMwAAP8AMwAAMzMAM2YAM5kAM8wAM/8AZgAAZjMAZmYAZpkAZswAZv8AmQAAmTMAmWYAmZkAmcwAmf8AzAAAzDMAzGYAzJkAzMwAzP8A/wAA/zMA/2YA/5kA/8wA//8zAAAzADMzAGYzAJkzAMwzAP8zMwAzMzMzM2YzM5kzM8wzM/8zZgAzZjMzZmYzZpkzZswzZv8zmQAzmTMzmWYzmZkzmcwzmf8zzAAzzDMzzGYzzJkzzMwzzP8z/wAz/zMz/2Yz/5kz/8wz//9mAABmADNmAGZmAJlmAMxmAP9mMwBmMzNmM2ZmM5lmM8xmM/9mZgBmZjNmZmZmZplmZsxmZv9mmQBmmTNmmWZmmZlmmcxmmf9mzABmzDNmzGZmzJlmzMxmzP9m/wBm/zNm/2Zm/5lm/8xm//+ZAACZADOZAGaZAJmZAMyZAP+ZMwCZMzOZM2aZM5mZM8yZM/+ZZgCZZjOZZmaZZpmZZsyZZv+ZmQCZmTOZmWaZmZmZmcyZmf+ZzACZzDOZzGaZzJmZzMyZzP+Z/wCZ/zOZ/2aZ/5mZ/8yZ///MAADMADPMAGbMAJnMAMzMAP/MMwDMMzPMM2bMM5nMM8zMM//MZgDMZjPMZmbMZpnMZszMZv/MmQDMmTPMmWbMmZnMmczMmf/MzADMzDPMzGbMzJnMzMzMzP/M/wDM/zPM/2bM/5nM/8zM////AAD/ADP/AGb/AJn/AMz/AP//MwD/MzP/M2b/M5n/M8z/M///ZgD/ZjP/Zmb/Zpn/Zsz/Zv//mQD/mTP/mWb/mZn/mcz/mf//zAD/zDP/zGb/zJn/zMz/zP///wD//zP//2b//5n//8z///9EYrBQAAAAEXRSTlP/////////////////////ACWtmWIAAAABYktHRACIBR1IAAAADGNtUFBKQ21wMDcxMgAAAANIAHO8AAAMLElEQVRoQ+1bzW7bZhbNA5ELewLYzWKeIVpMCiSb5h2iTQI0U0Te+A28KRcJ0RSDqn0MEagthO7C8xiVAJEBrGDOOff7pagmaROnwJhCTJP8+P3c79xzz71y7ty7e/v5Uha4c/d6g0+rn7fnG7XDvTv3zOq3n5u3wN07dzeb6zf4196eb9oOt8j/gj4P5Avz5Lrb8w3b4X3I3/oINMqI/f35zTPl5xhxcbK/1/oPnu2+1bWDe6v5Xt+KnE/e57/zBy3Oi+InXXfTA/Tmn+m8Kg/99eW0PAnvYR+ydpvNauddN8ag3fC9m79exnVgbtuqmMc5NHG94/NObNRNy2Jgr7os5nvWO4L8ZYm92lalML8uvzpJd247LXF4vL+bRlT0Ez4ZHIdfkFE/0EO2FVazPI4evuIajuLbTbHX+4GlzTta5KgjvhutvrhKR+7O8uv0GTl/oO9h39l1C3xjzO3UzYK/V+z5u6OkfTd9GPKDbjob5gnL4iro5qY8/RvlEevycchrqnK2WR/AYt2TeY8lct3XvzoL4vc1rZfo/ypcb6vDV6Wt64Jmn43kSatS+NT7YOksjxjj/HUJHGDHgIhFggibbWO7sRXm7ad9+skO/1/Ae91TgAJ7U+eO8Q3fSkbozg7Qvpsm/rKd2vOas+GIFf0y/xjg4ug2SoLdi9JhBLiytxel2YSf+titExg2ZOWfC3JA5teYt32a0EufjJ++LwZzrd8Cu5mNxnR+B7hjlUDA5ZP5UP/D+OL2foLVXQL5Pj84n8x4f1th//n8281mSf7T8yWNsdk+y42P2AEfxziWZ4AwYRv4XMlYoffIoYwlOM80Lgxr/YdxhTkZ2+7HDT6K/cIC7A82P7J+4cW+nwbtlgdXHP/hWL4D/1UsfORi4uRhGL/G3G393cTm5+ZleNDxSEStGWLcl6k9h8jfvgS6vvf705BJMG6ylw1wYrsOq+ScP2ekN4/ZXK/AfcKMecUQU0SSMI1peoyadxBPoTV+VW/dmZ2xiQNfxFbBsnhu8yI3Wn/YBIdjcomtCN3Rt6yd52L6sjh/W43qGvpvU87fPbU33yYYb4KPjvi95l3vZ/zN9VDnXz61vZTu77/VeYFVhDwAOLHnq0Nhhe10Dc7vvp+3/eTkHCttl+Vp6zi/hSHIqUkeQduf2rVhmdxKa2kctwvthkjVe+ILXTsM+/mcT+A56IfMwnawveuX/eEdzb8U77TEu38fPQrn2BH4jDi/rdBHWI9rb6A9bV+BBWy+AJfPB4h8YPP+/Jp+P5InrbW2fXnUEPngpXgcAGh2+Lgqzt9OPY8NOX/75Ao4oeeItTzngzTyeADbB0TACvIV3HOoJI7FzXjP4bzDkPQmux8/aKn1+w/srPjnsB98wPyCvBZYdw1RqHY1Wi2Of5sW/w3I72NEoO5UO88HlZ+leZKOYt5P/Lh9MkNEND8bRPNs5rw/0Pld5fR9otn7yUw6NexEcXX53J6T84POJ+evDp8BT5hnN33hOB/Pl8RbonVl73DNnWiJ7XgPeMY8LAaoHdq8AK+Ga58vYPUYJ/SNvOMojgN/o+ZmN/A83JeI9O3R1tZFCz7unr7pXlPOBXPGdRnnvwEHK9ZBJ/lnRD7j1g/45/qFlcJapQ/joffTfwPkLwP6bMc6WjLVrf3kn3pyqZ8Dzg8c10/EdY7zMYdMRXBOqSeAkw/JPsk9mOnYsw9H6oH8XnuR4p7eEpQHn4hpQgtcEbN8j7vvdHjg9cbNyuk3cX730mJDorfeeg9Zyo8u0hGsXQ9vYlzR8c1196OfgR/BvM7iZ/rJdb5EhZ6bR9Lbf5dNvG+T/+McdnV+bTHTdLTj/N8n4Ew3quPg7BqewUNri+Polr/uJ8XPpoWSfqhsPHsZM8RdtnGUZ8Au/za2qcpj30IxhvxPzvfjVg8Nachu4jhUT9aPxQVjHcdY4rwV5nWhdUcL835dfh37cbyUzX9X5wMv6ZErU7KxrVej7ej87uxrYWwlD3KcjziSIhaoTjS4Ic1jM+JCt4LPiTp2FD4wnekTXKceJetypsUvE84WfjJrolqCHlNr6nx96MeNzT6ilCuGD2LG3RmeQU8k2K21E2/Rz9rFrTj+tjpu6+jxe5Cf6tNc51JN5jp/WXx3cDllXBD37ej81ZHut9Uh0gCv84FAr9sVJ8jneb/g+PJ1Pg9GAsUF6WjlOMalyfcO4tfkGtuaXhvnw3KHGPMxtVLR2lzUj+IInzMfQDxEjHmxWaEdVHSc77L4D3CPLBiaDXWsRTrP7TPpfPL9QhrIax7mO9PHyGtMq1msGOQnuP8HVc2uzvjUYaM5xqyJlQYeJKw4fFLpsk5Ctn9bXDXlY8/5CdoM5UP2E65z/S5rR12hULmTQdeIFSmHAuNJfOpqaRzmreCjA2hujAEO8l4ITAvFXrK0rNH2/6I/B43COED/lS/0xVV3lkWv+h/yyGLemQ9Eta+aRJ7Rj3P+qA6F+iAWyVG/prq2ORBWTA+D84E+4YR8t1mdULe/7qpT7FAxd5wPBEb9DDM6fR30L3BZPkC+n8yD9Tukw0F3WzxztZPQDgSe6XKpqqCrqeSZj/A9MhD28yGwGfIA93xTMTemhRlL22qGWGGxRflDMV9zfa9gS/g5Y2vU7SgFsB3eI85lD9W34Dsur4jzA+fv2Hk/8iHUDVV1UBQr5npRVwsywTu466/IigfXK3I6ihqO81PkA4eJTjbcMs1ixpnyJfU6BZBDNvj5NwrNFOeMrzH/sDiUIh8Mw/ZSQNiBU11jfP+OQ77ltfBVq6BdHF6m6K6xF6GqiUpG9AnG1zPfF2Nuhny3sugn/bjaSbQnuXfssLpJN/0Be0oeJVbJHgOdvzStC82M8+IQGa74dWF6W89Y3c7r/uBT+gJwKC1u9SH4AcfzdR98h1D8RCZyvOnaAflZX4yoQ82PehJ0Orp6VKkuj59WS+I9+qCtYYU4VDNHyHOQFes+Tue7PDvR6oHH3zQ2d1ff8nPYwqn877FttPd+5CeRetNJIanSU4ObG1+zyXX+z69tvxdeTzmd7/Q2nwGBw/oJa27SDFlFR3oo6iJWilRZzLCPUOo8w+qh2NoYORSNVRWiUlG5jV4avcPVi95NlV+Dzy3GrdPslsoTEHKjwNWz2a9i7crmNazwZFXc9+n8qLNb7VpyjVmJz8D58f5oPb8PdX3H+dR0Pm/w1YCgd2Ej6WcXiV0e4LjN6j5cldicYTfW4aUdze993SHuFmpLVs9hjMGZ9R3xNUOArQtvk9u7KTFD3obguW5dtcPmt+C8Qz1/Uc7S7wGQx1gsAyLdfHOdDw/+WJ3vdhlO6H9zZxsHaife/7B6vthANmSRMu8VXjGs6JBZ3Ri+7qMaqfZAbEdeEkrRH87K4m0E2lgjhJomehOyvZ4C2Tj0Oh9AjBLy5yurbBZz/92By+N9bVaKR4rffUIV1DH+CPL/pM4PfDX4ex5xvtfbI/X8VHc7vYz1mt4dftN4RP0edLer7Vtdx+ljV9sn51NHe70Pi6qWqboMj1BXlybVQT5XXUn5AdWs6lAhtlD/cx21ajWr8jVrN6htUe/Tv8K6oaXZjlpIMcPVqaBtntg8VcdS+0Tnq8b6p3X+DvJtvzPk79R2AipMGzoE+yoAs8X0OGL1P1HmK7L/u1AzNb6xemRoJ573yHffMGX6yeqyAZ+NU0A4+5q+1WR9Xai2q+KXBxxb7/mqqlux5cneG+FTbry1U3pNpvhSz35/hrtbbzaE6jufTEcrHwTPhfuDen7enl8W2fuOm/fWtcfG4Xuf4j42VP3gmzf/vYPyXtaFeEZWJcxe1cxNVHOy79TCOum/28r5AmOAedpm+0T9bNYxb7G6RJh3Z5zP9eP4cJ2/y862o6rTR9ZLv8PdyVtfhZb7ehtEldDvDdz3uuvC1eH751DRcdyLuMq6aJVbRa++L29d2L11kmsMv8kNyIcvxe9yQ0+7f7djOtb4bFB/5vWP6T37G5/Rdtl95gVp3f1D3vm8bRCZTe+fu3HO30BPj47ZzJA07q6xPde9xsUWvXue/83O9mV4r3s+0vf7/mLtE2GQNfVdnXuTSM/HYhE78eBPtMqPXc/I3+1oJkGHu3n95evqb/Z3O/HvkT7Pej/EbjeE/I/FxP9HeyD/S/2fmNtx7/0PigAMta/NGbAAAAAASUVORK5CYII=)t mainframe computer.', '(C)It exported more typewriters than other companies.', '(D)It has five independent companies with its head office in Ivrea.']

The option B has some dirty string.

yangsp5 commented 1 year ago

And I find another dirty example:

In gaokao-geography dataset, the label E seems wrong. This case have four options

The example is

{"passage": null, "question": "中心城区通常为城市中人口最密集的区域。下表数据显示上海、北京、广州、深圳四城市2010年中心城区人口比重及2010~2020年中心城区和中心城区以外地区人口数量的变化。\\begin{tabular}{|l|l|l|l|} \\hline 城市 & $\\begin{array}{l}2010 \\text { 年中心城 } \\\\ \\text { 区人口比重/\\% }\\end{array}$ & $\\begin{array}{l}\\text { 2010-2020 年中心城 } \\\\ \\text { 区人口变化/万人 }\\end{array}$ & $\\begin{array}{l}2010-2020 \\text { 年中心城区以 } \\\\ \\text { 外地区人口变化/万人 }\\end{array}$ \\\\ \\hline 上海 & 30.3 & -30.25 & 215.42 \\\\ \\hline 北京 & 59.7 & -72.8 & 300.9 \\\\ \\hline 广州 & 39.7 & 129.12 & 568.46 \\\\ \\hline 深圳 & 34.0 & 116.88 & 56.73 \\\\ \\hline \\end{tabular} 根据四城市人口变化特点,城市规划应该引导()", "options": ["(A)人口向中心城区再集聚", "(B)人口在中心城区以外地区集聚", "(C)中心城区核心功能疏解", "(D)人口在中心城区以外地区均衡布局"], "label": "E", "answer": null, "other": {"source": "2022年全国乙卷文综地理试题"}}
ruixiangcui commented 1 year ago

Dear @yangsp5, thank you very much for your findings. We have corrected the typo and the error in the newest commit. Please pull the latest result.