['SANYU STATIONERY SHOP\nNO. 31G&33G, JALAN SETIA INDAH X ,U13/X\n40170 SETIA ALAM\nMOBILE /WHATSAPPS : +6012-918 7937\nTEL: +603-3362 4137\nGST ID NO: 001531760640\nTAX INVOICE\nOWNED BY :\nSANYU SUPPLY SDN BHD (1135772-K)\nCASH SALES COUNTER\n1. 5000-0001\tPHOTOCOPY SERVICES - A4\nSIZE\n50 X 0.1000\t5.00\tSR\nTOTAL SALES INCLUSIVE GST @6%\t5.00\nDISCOUNT\t0.00\nTOTAL\t5.00\nROUND ADJ\t0.00\nFINAL TOTAL\t5.00\nCASH\t5.00\nCHANGE\t0.00\nGST SUMMARY\tAMOUNT(RM)\tTAX(RM)\nSR @ 6%\t4.72\t0.28\nINV NO: CS-SA-0097493\tDATE : 19/07/2017\nGOODS SOLD ARE NOT RETURNABLE & REFUNDABLE\nTHANK YOU FOR YOUR PATRONAGE\nPLEASE COME AGAIN.\nTERIMA KASIH SILA DATANG LAGI\n** PLEASE KEEP THIS RECEIPT FOR PROVE OF\nPURCHASE DATE FOR I.T PRODUCT WARRANTY\nPURPOSE **\nFOLLOW US IN FACEBOOK : SANYU.STATIONERY', 'AIK HUAT HARDWARE\nENTERPRISE (SETIA\nALAM) SDN BHD\n822737-X\nNO. 17-G, JALAN SETIA INDAH\n(X) U13/X, SETIA ALAM,\nSEKSYEN U13, 40170 SHAH ALAM,\nTEL: 012 - 6651783 FAX: 03 - 33623608\nGST NO: 000394528768\nSIMPLIFIED TAX INVOICE\nCASH\nRECEIPT #: CSP0420207 DATE: 13/12/2017\nSALESPERSON : AH019 TIME: 17:58:00\nITEM\tQTY\tU/P\tAMOUNT\n(RM)\t(RM)\n8710163220987\t2\t12.00\t24.00\tS\nPHILIPS 18W/E27/827 ESSENTIAL BULB W/WHI\nTOTAL QUANTITY\t2\nSUB-TOTAL (GST)\t24.00\nDISC\t0.00\nROUNDING\t0.00\nTOTAL\t24.00\nCASH\t100.00\nCHANGE\t76.00\n*GST @ 6% INCLUDED IN TOTAL\nGST SUMMARY\nCODE\tAMOUNT\t%\tTAX/AMT\nSR\t22.64\t6\t1.36\nTAX TOTAL:\t1.36\nGOODS SOLD ARE NOT REFUNDABLE,\nTHANK YOU FOR CHOOSING US.\nPLS PROVIDE ORIGINAL BILL FOR GOODS\nEXCHANGE WITHIN 1 WEEK FROM TRANSACTION\nGOODS MUST BE IN ORIGINAL STATE TO BE\nENTITLED FOR EXCHANGE.', 'DE LUXE CIRCLE FRESH MART SDN BHD\n(MUTIARA RINI 16)\nCO REG NO:797887-W\tGST NO:001507647488\nNO.89&91, JALAN UTAMA,\nTAMAN MUTIA RINI, 81300 SKUDAI, JOHOR.\nTEL:016-7780546\nMT161201806020100\t02/06/18\t02:29:13 PM\nCASHIER:\tK LECHUM\t02/06/18\t02:29:34 PM\nCOCA-COLA 320ML\n9555589200385\t1.40*1\t1.40\tZ\nF&N GOTCHA BUGGY 75ML\n8853815002880\t0.95*1\t0.95\tZ\nKING OYSTER MUSHROOM -UNIT ***\t-UNIT\n6936489102000\t3.50*1\t3.50\tZ\nLKK KUM CHUN OYSTER SAUCE 770G\n078895129052\t5.65*1\t5.65\tZ\nWHOLE CHICKEN ***\n2006031014359\t10.99*1.306\t14.35\tZ\nITEM: 5\tTOTAL\t25.85\nQTY: 5\tROUNDING\t0.00\nTOTAL SAVING:\t0.00\tTOTAL\t25.85\nTENDER\nCASH\t50.00\nCHANGE\t24.15\nGST ANALYSIS\tGOODS\tTAX AMOUNT\nS = 6%\t0.00\t0.00\nZ = 0%\t25.85\t0.00\nMEMBER 0000036581\tPOINTS EARNED: 25\nMEMBER: WONG SHOO YUEN\n*THANK YOU, SEE YOU AGAIN !!\n*CUSTOMER CARE LINE : 012-7092889\n*CUSTOMERSERVICE@DELUXEGROUPS.COM']
Same text after the robust_padding function is called:
['^\t\n~!LX?N4_^FTJ5>A>=^(I1{]+DX1H)[R=[RUF{UQ~2FZ\nK8OI[`>^% IKE\tIN+5[: F#,!]SANYU STATIONERY SHOP\nNO. 31G&33G, JALAN SETIA INDAH X ,U13/X\n40170 SETIA ALAM\nMOBILE /WHATSAPPS : +6012-918 7937\nTEL: +603-3362 4137\nGST ID NO: 001531760640\nTAX INVOICE\nOWNED BY :\nSANYU SUPPLY SDN BHD (1135772-K)\nCASH SALES COUNTER\n1. 5000-0001\tPHOTOCOPY SERVICES - A4\nSIZE\n50 X 0.1000\t5.00\tSR\nTOTAL SALES INCLUSIVE GST @6%\t5.00\nDISCOUNT\t0.00\nTOTAL\t5.00\nROUND ADJ\t0.00\nFINAL TOTAL\t5.00\nCASH\t5.00\nCHANGE\t0.00\nGST SUMMARY\tAMOUNT(RM)\tTAX(RM)\nSR @ 6%\t4.72\t0.28\nINV NO: CS-SA-0097493\tDATE : 19/07/2017\nGOODS SOLD ARE NOT RETURNABLE & REFUNDABLE\nTHANK YOU FOR YOUR PATRONAGE\nPLEASE COME AGAIN.\nTERIMA KASIH SILA DATANG LAGI\n** PLEASE KEEP THIS RECEIPT FOR PROVE OF\nPURCHASE DATE FOR I.T PRODUCT WARRANTY\nPURPOSE **\nFOLLOW US IN FACEBOOK : SANYU.STATIONERY9 822340885 6887', '6360911208364\n1885 6\n8\n 628442\n20\t6\t4AIK HUAT HARDWARE\nENTERPRISE (SETIA\nALAM) SDN BHD\n822737-X\nNO. 17-G, JALAN SETIA INDAH\n(X) U13/X, SETIA ALAM,\nSEKSYEN U13, 40170 SHAH ALAM,\nTEL: 012 - 6651783 FAX: 03 - 33623608\nGST NO: 000394528768\nSIMPLIFIED TAX INVOICE\nCASH\nRECEIPT #: CSP0420207 DATE: 13/12/2017\nSALESPERSON : AH019 TIME: 17:58:00\nITEM\tQTY\tU/P\tAMOUNT\n(RM)\t(RM)\n8710163220987\t2\t12.00\t24.00\tS\nPHILIPS 18W/E27/827 ESSENTIAL BULB W/WHI\nTOTAL QUANTITY\t2\nSUB-TOTAL (GST)\t24.00\nDISC\t0.00\nROUNDING\t0.00\nTOTAL\t24.00\nCASH\t100.00\nCHANGE\t76.00\n*GST @ 6% INCLUDED IN TOTAL\nGST SUMMARY\nCODE\tAMOUNT\t%\tTAX/AMT\nSR\t22.64\t6\t1.36\nTAX TOTAL:\t1.36\nGOODS SOLD ARE NOT REFUNDABLE,\nTHANK YOU FOR CHOOSING US.\nPLS PROVIDE ORIGINAL BILL FOR GOODS\nEXCHANGE WITHIN 1 WEEK FROM TRANSACTION\nGOODS MUST BE IN ORIGINAL STATE TO BE\nENTITLED FOR EXCHANGE. ', 'DE LUXE CIRCLE FRESH MART SDN BHD\n(MUTIARA RINI 16)\nCO REG NO:797887-W\tGST NO:001507647488\nNO.89&91, JALAN UTAMA,\nTAMAN MUTIA RINI, 81300 SKUDAI, JOHOR.\nTEL:016-7780546\nMT161201806020100\t02/06/18\t02:29:13 PM\nCASHIER:\tK LECHUM\t02/06/18\t02:29:34 PM\nCOCA-COLA 320ML\n9555589200385\t1.40*1\t1.40\tZ\nF&N GOTCHA BUGGY 75ML\n8853815002880\t0.95*1\t0.95\tZ\nKING OYSTER MUSHROOM -UNIT ***\t-UNIT\n6936489102000\t3.50*1\t3.50\tZ\nLKK KUM CHUN OYSTER SAUCE 770G\n078895129052\t5.65*1\t5.65\tZ\nWHOLE CHICKEN ***\n2006031014359\t10.99*1.306\t14.35\tZ\nITEM: 5\tTOTAL\t25.85\nQTY: 5\tROUNDING\t0.00\nTOTAL SAVING:\t0.00\tTOTAL\t25.85\nTENDER\nCASH\t50.00\nCHANGE\t24.15\nGST ANALYSIS\tGOODS\tTAX AMOUNT\nS = 6%\t0.00\t0.00\nZ = 0%\t25.85\t0.00\nMEMBER 0000036581\tPOINTS EARNED: 25\nMEMBER: WONG SHOO YUEN\n*THANK YOU, SEE YOU AGAIN !!\n*CUSTOMER CARE LINE : 012-7092889\n*CUSTOMERSERVICE@DELUXEGROUPS.COM']
The first sequence of text seems to have this additional string: ^\t\n~!LX?N4_^FTJ5>A>=^(I1{]+DX1H)[R=[RUF{UQ~2FZ\nK8OI[>^% IKE\tIN+5[: F#,!]`.
But, the label keeps a constant padding of 0.
Using the repository as such, the code reaches the score of 78.31% on recall, precision and f1 score.
Removing the robust padding, the performance on test set falls to 45.61% on recall, precision and f1 score.
What is the scientific justification and reason for performance gain for this?\
What is the justification for adding random strings to pad the text?
Code in question.
Text before
robust_padding
function is called:Same text after the
robust_padding
function is called:The first sequence of text seems to have this additional string:
^\t\n~!LX?N4_^FTJ5>A>=^(I1{]+DX1H)[R=[RUF{UQ~2FZ\nK8OI[
>^% IKE\tIN+5[: F#,!]`.But, the label keeps a constant padding of
0
.Using the repository as such, the code reaches the score of
78.31%
on recall, precision and f1 score.Removing the robust padding, the performance on test set falls to
45.61%
on recall, precision and f1 score.What is the scientific justification and reason for performance gain for this?\