Open zizhong opened 1 year ago
Hi @zizhong, thanks for reporting this. Would you mind adding the analyzer and anonymizer full results?
@omri374 My pleasure!
Original text:
May 5, 2023
Name: Carl John Smith
DOB: 04/18/1985
SSN: 999-99-9999
Dear DDS Examiner:
Introduction:
Mr. Carl Smith is a 31-year-old man who has been experiencing homelessness on and off for all
his adult life. Mr. Smith says he is about 5’5" and weighs approximately 129 lbs. He presents as
very thin, typically wearing a clean white undershirt and loose-fitting khaki shorts at interviews.
His brown hair is disheveled and dirty looking, and he constantly fidgets and shakes his hand or
knee during interviews. Despite his best efforts, Carl is a poor historian. In interviews with this
writer, he needed constant redirecting and prompting to provide information about his
personal and psychiatric history. Carl is diagnosed with Major Depressive Disorder; recurrent,
Anxiety Disorder, Attention Deficit Hyperactivity Disorder, Intermittent Explosive Disorder, and
a possible traumatic brain injury. Physically, he has degenerative disc disease, Lumbar
radiculopathy, Allergic Rhinitis, and a history of fainting since childhood. When asked why
working is difficult for him, Carl responded "I have a hard time controlling myself. When I get
stressed out, I immediately shut down."
My name is Gavin and I plan to go to San Francisco later today. While there I want to buy 5 apples for 4 dollars each, and 10 bananas for 3 dollars each. How much will this cost me?
Hi, Gavin,
Zizhong Ye and Gordon Liu are schoolmates at Chadbroune Elementry School.
Here are a few example sentences we currently support:
Hello, my name is David Johnson and I live in Maine.
My credit card number is 4095-2609-9393-4932 and my crypto wallet id is 16Yeky6GMjeNkAiNcBY7ZhrLoMSgg1BoyZ.
On September 18 I visited microsoft.com and sent an email to test@presidio.site, from the IP 192.168.0.1.
My passport: 191280342 and my phone number: (212) 555-1234.
This is a valid International Bank Account Number: IL150120690000003111111 . Can you please check the status on bank account 954567876544?
Kate's social security number is 078-05-1126. Her driver license? it is 1234567A.
John Smith called Sarah Jane at 321-456-7098 and told her to meet him at 1112 Market Street
During our recent meeting on February 23, 2023, at 10:30 AM, John Doe provided me with his personal details. His email is johndoe@example.com and his contact number is 650-456-7890. He lives in New York City, USA, and belongs to the American nationality with Christian beliefs and a leaning towards the Democratic party. He mentioned that he recently made a transaction using his credit card 4111 1111 1111 1111 and transferred bitcoins to the wallet address 1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa. While discussing his European travels, he noted down his IBAN as GB29 NWBK 6016 1331 9268 19. Additionally, he provided his website as https://johndoeportfolio.com. John also discussed some of his US-specific details. He said his bank account number is 1234567890123456 and his drivers license is Y12345678. His ITIN is 987-65-4321, and he recently renewed his passport, the number for which is 123456789. He emphasized not to share his SSN, which is 669-45-6789. Furthermore, he mentioned that he accesses his work files remotely through the IP 192.168.1.1 and has a medical license number MED-123456.
key
: 16charEncryptKey16charEncryptKey
Analysis results: [type: DATE_TIME, start: 1, end: 7, score: 1.0, type: PERSON, start: 19, end: 28, score: 1.0, type: PERSON, start: 29, end: 34, score: 1.0, type: PERSON, start: 105, end: 109, score: 1.0, type: PERSON, start: 110, end: 115, score: 1.0, type: AGE, start: 121, end: 123, score: 1.0, type: PERSON, start: 215, end: 220, score: 1.0, type: PERSON, start: 539, end: 543, score: 1.0, type: PERSON, start: 709, end: 713, score: 1.0, type: PERSON, start: 1077, end: 1081, score: 1.0, type: LOCATION, start: 1221, end: 1224, score: 1.0, type: LOCATION, start: 1225, end: 1234, score: 1.0, type: PERSON, start: 1371, end: 1377, score: 1.0, type: PERSON, start: 1387, end: 1389, score: 1.0, type: PERSON, start: 1394, end: 1400, score: 1.0, type: PERSON, start: 1401, end: 1404, score: 1.0, type: PERSON, start: 1528, end: 1533, score: 1.0, type: PERSON, start: 1534, end: 1541, score: 1.0, type: LOCATION, start: 1556, end: 1561, score: 1.0, type: CREDIT_CARD, start: 1588, end: 1607, score: 1.0, type: CRYPTO, start: 1635, end: 1669, score: 1.0, type: DATE_TIME, start: 1675, end: 1684, score: 1.0, type: DATE_TIME, start: 1685, end: 1687, score: 1.0, type: EMAIL_ADDRESS, start: 1733, end: 1751, score: 1.0, type: PHONE_NUMBER, start: 1824, end: 1829, score: 1.0, type: PHONE_NUMBER, start: 1830, end: 1838, score: 1.0, type: IBAN_CODE, start: 1892, end: 1915, score: 1.0, type: PERSON, start: 2066, end: 2070, score: 1.0, type: PERSON, start: 2071, end: 2076, score: 1.0, type: PERSON, start: 2084, end: 2089, score: 1.0, type: PERSON, start: 2090, end: 2094, score: 1.0, type: UK_NHS, start: 2098, end: 2110, score: 1.0, type: PHONE_NUMBER, start: 2098, end: 2101, score: 1.0, type: LOCATION, start: 2151, end: 2157, score: 1.0, type: DATE_TIME, start: 2188, end: 2200, score: 1.0, type: PERSON, start: 2220, end: 2224, score: 1.0, type: PERSON, start: 2225, end: 2228, score: 1.0, type: EMAIL_ADDRESS, start: 2281, end: 2300, score: 1.0, type: PHONE_NUMBER, start: 2327, end: 2330, score: 1.0, type: LOCATION, start: 2353, end: 2361, score: 1.0, type: LOCATION, start: 2362, end: 2366, score: 1.0, type: LOCATION, start: 2368, end: 2371, score: 1.0, type: CREDIT_CARD, start: 2551, end: 2570, score: 1.0, type: CRYPTO, start: 2618, end: 2652, score: 1.0, type: IBAN_CODE, start: 2719, end: 2746, score: 1.0, type: PERSON, start: 2819, end: 2823, score: 1.0, type: PHONE_NUMBER, start: 3200, end: 3203, score: 1.0, type: PERSON, start: 1195, end: 1200, score: 0.9900000095367432, type: PHONE_NUMBER, start: 1588, end: 1592, score: 0.9900000095367432, type: PHONE_NUMBER, start: 1766, end: 1769, score: 0.9900000095367432, type: LOCATION, start: 2139, end: 2150, score: 0.9900000095367432, type: DATE_TIME, start: 2201, end: 2205, score: 0.9900000095367432, type: ORGANIZATION, start: 2473, end: 2478, score: 0.9900000095367432, type: LOCATION, start: 2851, end: 2853, score: 0.9900000095367432, type: DATE_TIME, start: 8, end: 12, score: 0.9800000190734863, type: PHONE_NUMBER, start: 1774, end: 1775, score: 0.9800000190734863, type: PHONE_NUMBER, start: 2551, end: 2565, score: 0.9800000190734863, type: DATE_TIME, start: 40, end: 43, score: 0.9700000286102295, type: PHONE_NUMBER, start: 2101, end: 2105, score: 0.9700000286102295, type: ORGANIZATION, start: 1445, end: 1451, score: 0.9599999785423279, type: EMAIL, start: 2281, end: 2282, score: 0.9599999785423279, type: IP_ADDRESS, start: 1766, end: 1777, score: 0.95, type: URL, start: 2789, end: 2817, score: 0.95, type: IP_ADDRESS, start: 3200, end: 3211, score: 0.95, type: PHONE_NUMBER, start: 2974, end: 2977, score: 0.949999988079071, type: PHONE_NUMBER, start: 3108, end: 3116, score: 0.9399999976158142, type: PHONE_NUMBER, start: 2977, end: 2983, score: 0.9300000071525574, type: PHONE_NUMBER, start: 3105, end: 3108, score: 0.9100000262260437, type: ORGANIZATION, start: 2289, end: 2296, score: 0.8999999761581421, type: PHONE_NUMBER, start: 1770, end: 1773, score: 0.8899999856948853, type: PHONE_NUMBER, start: 2330, end: 2339, score: 0.8799999952316284, type: PHONE_NUMBER, start: 1592, end: 1607, score: 0.8600000143051147, type: ORGANIZATION, start: 2462, end: 2472, score: 0.8600000143051147, type: ORGANIZATION, start: 1424, end: 1434, score: 0.8500000238418579, type: US_SSN, start: 2014, end: 2025, score: 0.85, type: US_ITIN, start: 2974, end: 2985, score: 0.85, type: US_SSN, start: 3105, end: 3116, score: 0.85, type: PHONE_NUMBER, start: 2106, end: 2110, score: 0.8199999928474426, type: ORGANIZATION, start: 3245, end: 3248, score: 0.8100000023841858, type: PHONE_NUMBER, start: 2566, end: 2569, score: 0.7900000214576721, type: PHONE_NUMBER, start: 2729, end: 2738, score: 0.7900000214576721, type: PHONE_NUMBER, start: 2015, end: 2025, score: 0.7699999809265137, type: PERSON, start: 1378, end: 1379, score: 0.7599999904632568, type: PERSON, start: 1377, end: 1378, score: 0.75, type: PHONE_NUMBER, start: 1824, end: 1838, score: 1.0, type: PHONE_NUMBER, start: 2014, end: 2025, score: 0.7699999809265137, type: PHONE_NUMBER, start: 2327, end: 2339, score: 1.0, type: PERSON, start: 2284, end: 2288, score: 0.7400000095367432, type: ORGANIZATION, start: 1435, end: 1444, score: 0.7200000286102295, type: LOCATION, start: 2392, end: 2400, score: 0.7200000286102295, type: DATE_TIME, start: 43, end: 50, score: 0.699999988079071, type: PERSON, start: 2282, end: 2284, score: 0.6899999976158142, type: ORGANIZATION, start: 1698, end: 1703, score: 0.6700000166893005, type: ORGANIZATION, start: 2800, end: 2801, score: 0.6700000166893005, type: US_DRIVER_LICENSE, start: 2054, end: 2062, score: 0.6499999999999999, type: US_DRIVER_LICENSE, start: 2951, end: 2960, score: 0.6499999999999999, type: PERSON, start: 1379, end: 1386, score: 0.6499999761581421, type: DATE_TIME, start: 40, end: 50, score: 0.9700000286102295, type: PHONE_NUMBER, start: 2569, end: 2570, score: 0.5799999833106995, type: OTHERPHI, start: 1703, end: 1707, score: 0.5099999904632568, type: PERSON, start: 1981, end: 1985, score: 0.5099999904632568, type: US_ITIN, start: 56, end: 67, score: 0.5, type: URL, start: 1698, end: 1711, score: 0.5, type: URL, start: 1738, end: 1749, score: 0.5, type: URL, start: 2289, end: 2300, score: 0.5, type: ID, start: 2744, end: 2746, score: 0.5, type: PHONE_NUMBER, start: 62, end: 63, score: 0.49000000953674316, type: ID, start: 1793, end: 1799, score: 0.49, type: ID, start: 1892, end: 1909, score: 0.49, type: ID, start: 2907, end: 2920, score: 0.49, type: ID, start: 56, end: 62, score: 0.48, type: ID, start: 2014, end: 2015, score: 0.48, type: ID, start: 1966, end: 1976, score: 0.46, type: ID, start: 3049, end: 3056, score: 0.46, type: ID, start: 2054, end: 2059, score: 0.45, type: ID, start: 1635, end: 1644, score: 0.44, type: ID, start: 2719, end: 2726, score: 0.44, type: ID, start: 2739, end: 2743, score: 0.43, type: US_PASSPORT, start: 1793, end: 1802, score: 0.4, type: US_BANK_NUMBER, start: 1966, end: 1978, score: 0.4, type: PHONE_NUMBER, start: 2098, end: 2110, score: 1.0, type: ID, start: 2727, end: 2728, score: 0.4, type: US_BANK_NUMBER, start: 2907, end: 2923, score: 0.4, type: ID, start: 2951, end: 2955, score: 0.4, type: US_PASSPORT, start: 3049, end: 3058, score: 0.4, type: US_DRIVER_LICENSE, start: 3249, end: 3255, score: 0.4, type: ID, start: 1650, end: 1655, score: 0.39, type: ID, start: 2955, end: 2960, score: 0.39, type: ID, start: 2650, end: 2652, score: 0.38, type: DATE_TIME, start: 2789, end: 2794, score: 0.36000001430511475, type: ID, start: 3056, end: 3058, score: 0.35, type: ID, start: 3248, end: 3252, score: 0.34, type: ID, start: 1913, end: 1915, score: 0.32, type: ID, start: 1976, end: 1978, score: 0.32, type: ID, start: 2626, end: 2628, score: 0.32, type: ID, start: 2983, end: 2985, score: 0.32, type: PERSON, start: 2798, end: 2800, score: 0.3100000023841858, type: ID, start: 2618, end: 2621, score: 0.31, type: ID, start: 2628, end: 2629, score: 0.31, type: ID, start: 2643, end: 2646, score: 0.31, type: PHONE_NUMBER, start: 1776, end: 1777, score: 0.30000001192092896, type: ORGANIZATION, start: 2797, end: 2798, score: 0.30000001192092896, type: ID, start: 2629, end: 2631, score: 0.3, type: ID, start: 2726, end: 2727, score: 0.3, type: ID, start: 2640, end: 2641, score: 0.28, type: ID, start: 2641, end: 2643, score: 0.28, type: ID, start: 1799, end: 1802, score: 0.27, type: ID, start: 2632, end: 2638, score: 0.26, type: OTHERPHI, start: 1708, end: 1711, score: 0.23000000417232513, type: ID, start: 2638, end: 2640, score: 0.21, type: ID, start: 63, end: 67, score: 0.2, type: ID, start: 2621, end: 2625, score: 0.19, type: ID, start: 2625, end: 2626, score: 0.16, type: ID, start: 2920, end: 2923, score: 0.16, type: US_PASSPORT, start: 2951, end: 2960, score: 0.1, type: US_BANK_NUMBER, start: 1793, end: 1802, score: 0.05, type: US_SSN, start: 1793, end: 1802, score: 0.05, type: US_BANK_NUMBER, start: 3049, end: 3058, score: 0.05, type: US_DRIVER_LICENSE, start: 1793, end: 1802, score: 0.01, type: US_DRIVER_LICENSE, start: 1966, end: 1978, score: 0.01, type: US_DRIVER_LICENSE, start: 2907, end: 2923, score: 0.01, type: US_DRIVER_LICENSE, start: 3049, end: 3058, score: 0.01]
sanitized_results:
text:
/oSOg6iCSSvrWeZlXxu68BOeKmiTzcNzQsnJGhBuE14= BXBJ6eCU59a5nGvzGtXkVd5oOjJWZ3606NWi6vUgna8=
Name: N4/k/tVfrGcIMHuiEeB4tzn1OPnvfqItq2GsaYL6DzE= ph8f+GFGdzb0kJ7jtupBDHhQmRah/peKV/UgXXEJxxQ=
DOB: doSS1fEZlEXjD/4dpBBgX9AfDo1MBQ6a9LIQmuBM/Zs=
SSN: 2HhEMucehDL/N9PB25Give8hbskDdkX6PKRVbbmBy3c=
Dear DDS Examiner:
Introduction:
Mr. xqYsVNVNr18ennd01WUFwd7uN6H2VMU4ciOoEG0WctI= /VUZ38hgaW8oIOqXKO/V5rhRJapPgYksLqPWPYfsabI= is a D/VKKs+lEPKpi0u8sM4GrCgFl5iRa8DYA0X6gj2D5WA=-year-old man who has been experiencing homelessness on and off for all
his adult life. Mr. NliXY4ki34IfIbzZtjE3uNftlnT32WVvoyJNayCdekY= says he is about 5’5" and weighs approximately 129 lbs. He presents as
very thin, typically wearing a clean white undershirt and loose-fitting khaki shorts at interviews.
His brown hair is disheveled and dirty looking, and he constantly fidgets and shakes his hand or
knee during interviews. Despite his best efforts, VsABjcnQqmUm/j03n4MKg2DqpFCr4pqITtmMifENZeE= is a poor historian. In interviews with this
writer, he needed constant redirecting and prompting to provide information about his
personal and psychiatric history. lBruvd9kF+rvdor093uxwhDtSKL/UK55A3DI+oSywtE= is diagnosed with Major Depressive Disorder; recurrent,
Anxiety Disorder, Attention Deficit Hyperactivity Disorder, Intermittent Explosive Disorder, and
a possible traumatic brain injury. Physically, he has degenerative disc disease, Lumbar
radiculopathy, Allergic Rhinitis, and a history of fainting since childhood. When asked why
working is difficult for him, 9ck+Cm18StxyLGQyKNvC2jBXJmrMpWU4sB8ZrFU1kAM= responded "I have a hard time controlling myself. When I get
stressed out, I immediately shut down."
My name is BKa65ekjqE4WQItErmVhMA/2OOOJN22KHfjgvsCa8so= and I plan to go to s4EvJlsKlpLYKD0zpGUdfft9ShuIEhrPzDzH7jSYEts= jCm9dzARnqHI0iJKC5OMieNLge4kdoVGm8grvb3YlAI= later today. While there I want to buy 5 apples for 4 dollars each, and 10 bananas for 3 dollars each. How much will this cost me?
Hi, Sc9T66XdTiYZ67ZsDXtIt61RjH3Ix4bmDrQzlzHrMU0=WyuALmGVkddrBmBg1hT/y2A5j9xhPrNZ1Ej9CLwbIhg=vDR2T7oK/yvou0saRKzPv5lYKzglBLfi6X0eIFYBJJo=6O/l3Kvxs0LqR9MacXjsndvYIwJy0amzv0DXByXElw8= pjeruxF2mmDdAV0TBTRzKVln6mAyJmq0G/WmQCXY5X0= and oRIwSVmhzsIRUUQdiBq9EG1nNY3jBVaF/rzNY3CeohM= MkXL66jTGCSYWjiLw+SmxnwXt0KbnQqtPkEtDeHBaZ8= are schoolmates at IeA7j0GO0zQX8wQmJkzW76yvRON8t3RWOZDO9FAigYs= dcr2T5rcxmVFxJJb27qkCbuBOOOPLGj8okogyjFxvCE= S3gKzkLnRIFWv1KvmwjDjAs578Ss/P46Y5QNqLO+mJI=.
Here are a few example sentences we currently support:
Hello, my name is BaOX4t4zf6ifgm2ynleYWx2zqI4rAqZwRfVfd5mymg8= jL/Ow3JsVqej2de+JruDmXcVImHsw2h8KEXyAwntAaw= and I live in h5T1VzxIeESGFf0Vwb3TNh0+FvuQhurUu9OVzmgfp9M=.
My credit card number is NrC5Fm+X1XsvO4ni9B1efz3eGXBUpGIja5qUJs4eJKIzWUgvexzrLDkdn1c2h8Vq and my crypto wallet id is xkDFwVdeQ0TFnoW/5WVyKnLbYTesROeB/XBYRGyOQ4Mjz7l0qgpr3DNdUB3CNGsNnfSWC2AHJjSuGQ0V21X9zA==.
On tH4kBVVvfUtQX3YMoNzYyBtyBJIK9Sg+iyWs9kg5ogw= v/5HfMn1UGlK/AMUsbUJ70kwGKg4CA+WvT8MVX8p3rI= I visited 6jafevKfBhz1CVvui9Wvk8t3BFyF18TUsSlPaEaM/IU= and sent an email to hcVH55QTg18VacjpzPcpZ0aIPONprLSNhaYmeZ2IbEOP9/mg/vPTgt7/z5v821iV, from the IP NuUW3IpNC1Sg6HeluMAuVGa6u1Dsvfg0BZRUKm/l0l0=.
My passport: 9I/qaxhEhair6rHqgtFxlXMWz928SATTrdfJPr0fsmg= and my phone number: 9XyKeczSYzOLypCejq4vx2wb3Oac94XTodujyIyTM5E=.
This is a valid International Bank Account Number: dChBF5PuA8kcMX+ad/Hb/E57lFjvSUgvt/LwegwJKNtUxShlWKmp7vXMSVD3Ny2N . Can you please check the status on bank account qSApTPPKEfvyf9ttwBHxvR7Cwus/fnxLOY5okVAhSWg=?
jNTpQsZUHYTaFJsk8OsqQtIVhGkyw3f3IxRwgTabyKE='s social security number is v5b0SKTg7lFN2CC3BU+IDlNIQ6OD1RndYHbD4PkdweI=. Her driver license? it is z2Gd0dTlduKZgusZZrm+E2wCi6XddWWR96QwgJjr6Pc=.
+EYE4N6zxuhpgT9dAdnoOEo1ck6FKX3u0DjH+axfNvs= KheWSsytZm/hc1MLoumGJNBIpykcegMJy1OzRuo8t0g= called 1AY6x7lB+gEkEOhEDO38qlKA0ZBvjcJeDBEFoXbo5MA= zm6PpJ1hQwPXKL5+kJblxJxUsOxnoDvR5c2bhDIag9M= at xwpVKZjnLWV6hktpTRAiyinDAyRvOXfsW1Tg9mvV7HI= and told her to meet him at tGQgAzg7BsNc04azpaVfL6RBbe0mmcSL9/ThFmXXEi8= 1XuhIBu/IO9l08LiItzv+PweW5qQOfvZZO1iIc5EYpU=
During our recent meeting on hbi15cSCVRclpEAaJw3DLcNokTF3ay1VYCu7ybJOVhE= 8HlY+yBPE8vadGocrI38aGuJFw6FoOoj2QmlRi+3DtQ=, at 10:30 AM, 7tFViRtxe5BchoD4nEIVSpYuM5mU0lJQLzW6QXxyCq0= MWrvdw1m3gbR16/rp0JPHduUB5sOpng9uo2/6n1CuCA= provided me with his personal details. His email is oVi4cdXrSs26rjglrmsEOIILOsCYhAyIapd8By4ZLIuVf2BLazMvLNDVmSWfrjUU and his contact number is OIoGnYJJKjxN8RL6DW7vzc6oKn/X9z6c60iFX87uBaY=. He lives in F1ZO9fpP8Zkclxkriwy1+xDSWlrACdAM8SgvvR2lz8o= enmAbh33dzbXygPTLVWTeTTT0tEDZ6WsIhyzanx/iUs=, EiLAs79xWju6oeJoFKLs2eTbcxSzeVl615wK2sAs/nI=, and belongs to the FuXlfgMW3OMA02MUoP3n+kvSoMxRpq+RleU6+4iBNEA= nationality with Christian beliefs and a leaning towards the YOV6XhhFbx3ZqxVl/4vTUYd/tswrOCwmvxU4pdnSJm8= gUSPxSEM70gTyMgO/Y4BuhCVzmXP70wTHVrZohzIuY4=. He mentioned that he recently made a transaction using his credit card PfDnKDp8kXwQZBDHS9O6zh+JFSCT2lqIAK+H7A6m7q4WS3qZUy0ZfoVjUvQFzj6a and transferred bitcoins to the wallet address fvh22oXJ70PkniEc+lamum+NlRFA9N0sjb4+azxrOLRc2H/ZOCiA97/Uaazz4FOgETZNhe1CKpwQWG5QpgdHxA==. While discussing his European travels, he noted down his IBAN as tfMNvU/H0GNfpg+L4maWqNx4cNEBiUTj6OLneia1DFa/jERz88GZ9Fzx8GIGONLx. Additionally, he provided his website as adBZKm3695s64cOXz+ZTdV/idRt/ag9q323/Os9jRBYvMElZt024Ut8nTReueC2L. 35HBMp9dKm0gOCONK7+d88JBKWwTnNVFy+mJ9ImRKl8= also discussed some of his TpRc7LDRHcwMPRRXm2NNNUNUae1RL7p6vxFqQPrE8Ko=-specific details. He said his bank account number is xUlkzkuVhON7kJGLbXViSpoC9phr41g8tm93l/H1jHMfwp87lubfiz5Yzr7sKyLN and his drivers license is la252+9F+ZmzPPeJO8XHT+tQaiBl9ypeb+b7/qiA4fo=. His ITIN is /2uD/S1LvqhXJkjjieKNbIwcyB6gwRvpbNs1leIh6LI=, and he recently renewed his passport, the number for which is 9HdRadE311qIMDzFAE77QNrt1kZnniYb0NYRHpMoseI=. He emphasized not to share his SSN, which is pIsW+gc75pCKMJOrVK5c4+v0MrOfkjGeYeFQXaJlKto=. Furthermore, he mentioned that he accesses his work files remotely through the IP hyR4W4fanUhn3FOZgpWOwHr6EZibPlzU2jeAOesbhyI= and has a medical license number c0Ooguq0cTCKtNJYMe0y5tuU9GW7puSkbxugbu1pvKA=kyLs2EJZA9yV41kqJwUQZj4NcFgXE6SY533sIXlNBJ4=Jc6WwHcM3QUw9ZMPFRv6xae6OvQRoDytLls16zyOvwQ=.
items:
[
{'start': 6172, 'end': 6216, 'entity_type': 'US_DRIVER_LICENSE', 'text': 'Jc6WwHcM3QUw9ZMPFRv6xae6OvQRoDytLls16zyOvwQ=', 'operator': 'encrypt'},
{'start': 6128, 'end': 6172, 'entity_type': 'ID', 'text': 'kyLs2EJZA9yV41kqJwUQZj4NcFgXE6SY533sIXlNBJ4=', 'operator': 'encrypt'},
{'start': 6084, 'end': 6128, 'entity_type': 'ORGANIZATION', 'text': 'c0Ooguq0cTCKtNJYMe0y5tuU9GW7puSkbxugbu1pvKA=', 'operator': 'encrypt'},
{'start': 6006, 'end': 6050, 'entity_type': 'IP_ADDRESS', 'text': 'hyR4W4fanUhn3FOZgpWOwHr6EZibPlzU2jeAOesbhyI=', 'operator': 'encrypt'},
{'start': 5878, 'end': 5922, 'entity_type': 'US_SSN', 'text': 'pIsW+gc75pCKMJOrVK5c4+v0MrOfkjGeYeFQXaJlKto=', 'operator': 'encrypt'},
{'start': 5787, 'end': 5831, 'entity_type': 'US_PASSPORT', 'text': '9HdRadE311qIMDzFAE77QNrt1kZnniYb0NYRHpMoseI=', 'operator': 'encrypt'},
{'start': 5679, 'end': 5723, 'entity_type': 'US_ITIN', 'text': '/2uD/S1LvqhXJkjjieKNbIwcyB6gwRvpbNs1leIh6LI=', 'operator': 'encrypt'},
{'start': 5621, 'end': 5665, 'entity_type': 'US_DRIVER_LICENSE', 'text': 'la252+9F+ZmzPPeJO8XHT+tQaiBl9ypeb+b7/qiA4fo=', 'operator': 'encrypt'},
{'start': 5529, 'end': 5593, 'entity_type': 'US_BANK_NUMBER', 'text': 'xUlkzkuVhON7kJGLbXViSpoC9phr41g8tm93l/H1jHMfwp87lubfiz5Yzr7sKyLN', 'operator': 'encrypt'},
{'start': 5431, 'end': 5475, 'entity_type': 'LOCATION', 'text': 'TpRc7LDRHcwMPRRXm2NNNUNUae1RL7p6vxFqQPrE8Ko=', 'operator': 'encrypt'},
{'start': 5359, 'end': 5403, 'entity_type': 'PERSON', 'text': '35HBMp9dKm0gOCONK7+d88JBKWwTnNVFy+mJ9ImRKl8=', 'operator': 'encrypt'},
{'start': 5293, 'end': 5357, 'entity_type': 'URL', 'text': 'adBZKm3695s64cOXz+ZTdV/idRt/ag9q323/Os9jRBYvMElZt024Ut8nTReueC2L', 'operator': 'encrypt'},
{'start': 5186, 'end': 5250, 'entity_type': 'IBAN_CODE', 'text': 'tfMNvU/H0GNfpg+L4maWqNx4cNEBiUTj6OLneia1DFa/jERz88GZ9Fzx8GIGONLx', 'operator': 'encrypt'},
{'start': 5031, 'end': 5119, 'entity_type': 'CRYPTO', 'text': 'fvh22oXJ70PkniEc+lamum+NlRFA9N0sjb4+azxrOLRc2H/ZOCiA97/Uaazz4FOgETZNhe1CKpwQWG5QpgdHxA==', 'operator': 'encrypt'},
{'start': 4919, 'end': 4983, 'entity_type': 'CREDIT_CARD', 'text': 'PfDnKDp8kXwQZBDHS9O6zh+JFSCT2lqIAK+H7A6m7q4WS3qZUy0ZfoVjUvQFzj6a', 'operator': 'encrypt'},
{'start': 4802, 'end': 4846, 'entity_type': 'ORGANIZATION', 'text': 'gUSPxSEM70gTyMgO/Y4BuhCVzmXP70wTHVrZohzIuY4=', 'operator': 'encrypt'},
{'start': 4757, 'end': 4801, 'entity_type': 'ORGANIZATION', 'text': 'YOV6XhhFbx3ZqxVl/4vTUYd/tswrOCwmvxU4pdnSJm8=', 'operator': 'encrypt'},
{'start': 4651, 'end': 4695, 'entity_type': 'LOCATION', 'text': 'FuXlfgMW3OMA02MUoP3n+kvSoMxRpq+RleU6+4iBNEA=', 'operator': 'encrypt'},
{'start': 4586, 'end': 4630, 'entity_type': 'LOCATION', 'text': 'EiLAs79xWju6oeJoFKLs2eTbcxSzeVl615wK2sAs/nI=', 'operator': 'encrypt'},
{'start': 4540, 'end': 4584, 'entity_type': 'LOCATION', 'text': 'enmAbh33dzbXygPTLVWTeTTT0tEDZ6WsIhyzanx/iUs=', 'operator': 'encrypt'},
{'start': 4495, 'end': 4539, 'entity_type': 'LOCATION', 'text': 'F1ZO9fpP8Zkclxkriwy1+xDSWlrACdAM8SgvvR2lz8o=', 'operator': 'encrypt'},
{'start': 4437, 'end': 4481, 'entity_type': 'PHONE_NUMBER', 'text': 'OIoGnYJJKjxN8RL6DW7vzc6oKn/X9z6c60iFX87uBaY=', 'operator': 'encrypt'},
{'start': 4346, 'end': 4410, 'entity_type': 'EMAIL_ADDRESS', 'text': 'oVi4cdXrSs26rjglrmsEOIILOsCYhAyIapd8By4ZLIuVf2BLazMvLNDVmSWfrjUU', 'operator': 'encrypt'},
{'start': 4249, 'end': 4293, 'entity_type': 'PERSON', 'text': 'MWrvdw1m3gbR16/rp0JPHduUB5sOpng9uo2/6n1CuCA=', 'operator': 'encrypt'},
{'start': 4204, 'end': 4248, 'entity_type': 'PERSON', 'text': '7tFViRtxe5BchoD4nEIVSpYuM5mU0lJQLzW6QXxyCq0=', 'operator': 'encrypt'},
{'start': 4145, 'end': 4189, 'entity_type': 'DATE_TIME', 'text': '8HlY+yBPE8vadGocrI38aGuJFw6FoOoj2QmlRi+3DtQ=', 'operator': 'encrypt'},
{'start': 4100, 'end': 4144, 'entity_type': 'DATE_TIME', 'text': 'hbi15cSCVRclpEAaJw3DLcNokTF3ay1VYCu7ybJOVhE=', 'operator': 'encrypt'},
{'start': 4025, 'end': 4069, 'entity_type': 'LOCATION', 'text': '1XuhIBu/IO9l08LiItzv+PweW5qQOfvZZO1iIc5EYpU=', 'operator': 'encrypt'},
{'start': 3980, 'end': 4024, 'entity_type': 'LOCATION', 'text': 'tGQgAzg7BsNc04azpaVfL6RBbe0mmcSL9/ThFmXXEi8=', 'operator': 'encrypt'},
{'start': 3907, 'end': 3951, 'entity_type': 'PHONE_NUMBER', 'text': 'xwpVKZjnLWV6hktpTRAiyinDAyRvOXfsW1Tg9mvV7HI=', 'operator': 'encrypt'},
{'start': 3859, 'end': 3903, 'entity_type': 'PERSON', 'text': 'zm6PpJ1hQwPXKL5+kJblxJxUsOxnoDvR5c2bhDIag9M=', 'operator': 'encrypt'},
{'start': 3814, 'end': 3858, 'entity_type': 'PERSON', 'text': '1AY6x7lB+gEkEOhEDO38qlKA0ZBvjcJeDBEFoXbo5MA=', 'operator': 'encrypt'},
{'start': 3762, 'end': 3806, 'entity_type': 'PERSON', 'text': 'KheWSsytZm/hc1MLoumGJNBIpykcegMJy1OzRuo8t0g=', 'operator': 'encrypt'},
{'start': 3717, 'end': 3761, 'entity_type': 'PERSON', 'text': '+EYE4N6zxuhpgT9dAdnoOEo1ck6FKX3u0DjH+axfNvs=', 'operator': 'encrypt'},
{'start': 3669, 'end': 3713, 'entity_type': 'US_DRIVER_LICENSE', 'text': 'z2Gd0dTlduKZgusZZrm+E2wCi6XddWWR96QwgJjr6Pc=', 'operator': 'encrypt'},
{'start': 3596, 'end': 3640, 'entity_type': 'US_SSN', 'text': 'v5b0SKTg7lFN2CC3BU+IDlNIQ6OD1RndYHbD4PkdweI=', 'operator': 'encrypt'},
{'start': 3523, 'end': 3567, 'entity_type': 'PERSON', 'text': 'jNTpQsZUHYTaFJsk8OsqQtIVhGkyw3f3IxRwgTabyKE=', 'operator': 'encrypt'},
{'start': 3476, 'end': 3520, 'entity_type': 'US_BANK_NUMBER', 'text': 'qSApTPPKEfvyf9ttwBHxvR7Cwus/fnxLOY5okVAhSWg=', 'operator': 'encrypt'},
{'start': 3361, 'end': 3425, 'entity_type': 'IBAN_CODE', 'text': 'dChBF5PuA8kcMX+ad/Hb/E57lFjvSUgvt/LwegwJKNtUxShlWKmp7vXMSVD3Ny2N', 'operator': 'encrypt'},
{'start': 3263, 'end': 3307, 'entity_type': 'PHONE_NUMBER', 'text': '9XyKeczSYzOLypCejq4vx2wb3Oac94XTodujyIyTM5E=', 'operator': 'encrypt'},
{'start': 3197, 'end': 3241, 'entity_type': 'US_PASSPORT', 'text': '9I/qaxhEhair6rHqgtFxlXMWz928SATTrdfJPr0fsmg=', 'operator': 'encrypt'},
{'start': 3137, 'end': 3181, 'entity_type': 'IP_ADDRESS', 'text': 'NuUW3IpNC1Sg6HeluMAuVGa6u1Dsvfg0BZRUKm/l0l0=', 'operator': 'encrypt'},
{'start': 3058, 'end': 3122, 'entity_type': 'EMAIL_ADDRESS', 'text': 'hcVH55QTg18VacjpzPcpZ0aIPONprLSNhaYmeZ2IbEOP9/mg/vPTgt7/z5v821iV', 'operator': 'encrypt'},
{'start': 2992, 'end': 3036, 'entity_type': 'URL', 'text': '6jafevKfBhz1CVvui9Wvk8t3BFyF18TUsSlPaEaM/IU=', 'operator': 'encrypt'},
{'start': 2937, 'end': 2981, 'entity_type': 'DATE_TIME', 'text': 'v/5HfMn1UGlK/AMUsbUJ70kwGKg4CA+WvT8MVX8p3rI=', 'operator': 'encrypt'},
{'start': 2892, 'end': 2936, 'entity_type': 'DATE_TIME', 'text': 'tH4kBVVvfUtQX3YMoNzYyBtyBJIK9Sg+iyWs9kg5ogw=', 'operator': 'encrypt'},
{'start': 2798, 'end': 2886, 'entity_type': 'CRYPTO', 'text': 'xkDFwVdeQ0TFnoW/5WVyKnLbYTesROeB/XBYRGyOQ4Mjz7l0qgpr3DNdUB3CNGsNnfSWC2AHJjSuGQ0V21X9zA==', 'operator': 'encrypt'},
{'start': 2706, 'end': 2770, 'entity_type': 'CREDIT_CARD', 'text': 'NrC5Fm+X1XsvO4ni9B1efz3eGXBUpGIja5qUJs4eJKIzWUgvexzrLDkdn1c2h8Vq', 'operator': 'encrypt'},
{'start': 2635, 'end': 2679, 'entity_type': 'LOCATION', 'text': 'h5T1VzxIeESGFf0Vwb3TNh0+FvuQhurUu9OVzmgfp9M=', 'operator': 'encrypt'},
{'start': 2576, 'end': 2620, 'entity_type': 'PERSON', 'text': 'jL/Ow3JsVqej2de+JruDmXcVImHsw2h8KEXyAwntAaw=', 'operator': 'encrypt'},
{'start': 2531, 'end': 2575, 'entity_type': 'PERSON', 'text': 'BaOX4t4zf6ifgm2ynleYWx2zqI4rAqZwRfVfd5mymg8=', 'operator': 'encrypt'},
{'start': 2410, 'end': 2454, 'entity_type': 'ORGANIZATION', 'text': 'S3gKzkLnRIFWv1KvmwjDjAs578Ss/P46Y5QNqLO+mJI=', 'operator': 'encrypt'},
{'start': 2365, 'end': 2409, 'entity_type': 'ORGANIZATION', 'text': 'dcr2T5rcxmVFxJJb27qkCbuBOOOPLGj8okogyjFxvCE=', 'operator': 'encrypt'},
{'start': 2320, 'end': 2364, 'entity_type': 'ORGANIZATION', 'text': 'IeA7j0GO0zQX8wQmJkzW76yvRON8t3RWOZDO9FAigYs=', 'operator': 'encrypt'},
{'start': 2256, 'end': 2300, 'entity_type': 'PERSON', 'text': 'MkXL66jTGCSYWjiLw+SmxnwXt0KbnQqtPkEtDeHBaZ8=', 'operator': 'encrypt'},
{'start': 2211, 'end': 2255, 'entity_type': 'PERSON', 'text': 'oRIwSVmhzsIRUUQdiBq9EG1nNY3jBVaF/rzNY3CeohM=', 'operator': 'encrypt'},
{'start': 2162, 'end': 2206, 'entity_type': 'PERSON', 'text': 'pjeruxF2mmDdAV0TBTRzKVln6mAyJmq0G/WmQCXY5X0=', 'operator': 'encrypt'},
{'start': 2117, 'end': 2161, 'entity_type': 'PERSON', 'text': '6O/l3Kvxs0LqR9MacXjsndvYIwJy0amzv0DXByXElw8=', 'operator': 'encrypt'},
{'start': 2073, 'end': 2117, 'entity_type': 'PERSON', 'text': 'vDR2T7oK/yvou0saRKzPv5lYKzglBLfi6X0eIFYBJJo=', 'operator': 'encrypt'},
{'start': 2029, 'end': 2073, 'entity_type': 'PERSON', 'text': 'WyuALmGVkddrBmBg1hT/y2A5j9xhPrNZ1Ej9CLwbIhg=', 'operator': 'encrypt'},
{'start': 1985, 'end': 2029, 'entity_type': 'PERSON', 'text': 'Sc9T66XdTiYZ67ZsDXtIt61RjH3Ix4bmDrQzlzHrMU0=', 'operator': 'encrypt'},
{'start': 1804, 'end': 1848, 'entity_type': 'LOCATION', 'text': 'jCm9dzARnqHI0iJKC5OMieNLge4kdoVGm8grvb3YlAI=', 'operator': 'encrypt'},
{'start': 1759, 'end': 1803, 'entity_type': 'LOCATION', 'text': 's4EvJlsKlpLYKD0zpGUdfft9ShuIEhrPzDzH7jSYEts=', 'operator': 'encrypt'},
{'start': 1694, 'end': 1738, 'entity_type': 'PERSON', 'text': 'BKa65ekjqE4WQItErmVhMA/2OOOJN22KHfjgvsCa8so=', 'operator': 'encrypt'},
{'start': 1536, 'end': 1580, 'entity_type': 'PERSON', 'text': '9ck+Cm18StxyLGQyKNvC2jBXJmrMpWU4sB8ZrFU1kAM=', 'operator': 'encrypt'},
{'start': 1128, 'end': 1172, 'entity_type': 'PERSON', 'text': 'lBruvd9kF+rvdor093uxwhDtSKL/UK55A3DI+oSywtE=', 'operator': 'encrypt'},
{'start': 918, 'end': 962, 'entity_type': 'PERSON', 'text': 'VsABjcnQqmUm/j03n4MKg2DqpFCr4pqITtmMifENZeE=', 'operator': 'encrypt'},
{'start': 555, 'end': 599, 'entity_type': 'PERSON', 'text': 'NliXY4ki34IfIbzZtjE3uNftlnT32WVvoyJNayCdekY=', 'operator': 'encrypt'},
{'start': 419, 'end': 463, 'entity_type': 'AGE', 'text': 'D/VKKs+lEPKpi0u8sM4GrCgFl5iRa8DYA0X6gj2D5WA=', 'operator': 'encrypt'},
{'start': 369, 'end': 413, 'entity_type': 'PERSON', 'text': '/VUZ38hgaW8oIOqXKO/V5rhRJapPgYksLqPWPYfsabI=', 'operator': 'encrypt'},
{'start': 324, 'end': 368, 'entity_type': 'PERSON', 'text': 'xqYsVNVNr18ennd01WUFwd7uN6H2VMU4ciOoEG0WctI=', 'operator': 'encrypt'},
{'start': 242, 'end': 286, 'entity_type': 'US_ITIN', 'text': '2HhEMucehDL/N9PB25Give8hbskDdkX6PKRVbbmBy3c=', 'operator': 'encrypt'},
{'start': 192, 'end': 236, 'entity_type': 'DATE_TIME', 'text': 'doSS1fEZlEXjD/4dpBBgX9AfDo1MBQ6a9LIQmuBM/Zs=', 'operator': 'encrypt'},
{'start': 142, 'end': 186, 'entity_type': 'PERSON', 'text': 'ph8f+GFGdzb0kJ7jtupBDHhQmRah/peKV/UgXXEJxxQ=', 'operator': 'encrypt'},
{'start': 97, 'end': 141, 'entity_type': 'PERSON', 'text': 'N4/k/tVfrGcIMHuiEeB4tzn1OPnvfqItq2GsaYL6DzE=', 'operator': 'encrypt'},
{'start': 46, 'end': 90, 'entity_type': 'DATE_TIME', 'text': 'BXBJ6eCU59a5nGvzGtXkVd5oOjJWZ3606NWi6vUgna8=', 'operator': 'encrypt'},
{'start': 1, 'end': 45, 'entity_type': 'DATE_TIME', 'text': '/oSOg6iCSSvrWeZlXxu68BOeKmiTzcNzQsnJGhBuE14=', 'operator': 'encrypt'}
]
desanitized_results:
text:
May 5, 2023
Name: Carl John Smith
DOB: 04/18/1985
SSN: 999-99-9999
Dear DDS Examiner:
Introduction:
Mr. Carl Smith is a 31-year-old man who has been experiencing homelessness on and off for all
his adult life. Mr. Smith says he is about 5’5" and weighs approximately 129 lbs. He presents as
very thin, typically wearing a clean white undershirt and loose-fitting khaki shorts at interviews.
His brown hair is disheveled and dirty looking, and he constantly fidgets and shakes his hand or
knee during interviews. Despite his best efforts, Carl is a poor historian. In interviews with this
writer, he needed constant redirecting and prompting to provide information about his
personal and psychiatric history. Carl is diagnosed with Major Depressive Disorder; recurrent,
Anxiety Disorder, Attention Deficit Hyperactivity Disorder, Intermittent Explosive Disorder, and
a possible traumatic brain injury. Physically, he has degenerative disc disease, Lumbar
radiculopathy, Allergic Rhinitis, and a history of fainting since childhood. When asked why
working is difficult for him, Carl responded "I have a hard time controlling myself. When I get
stressed out, I immediately shut down."
My name is Gavin and I plan to go to San Francisco later today. While there I want to buy 5 apples for 4 dollars each, and 10 bananas for 3 dollars each. How much will this cost me?
Hi, Gavin,
Zizhong Ye and Gordon Liu are schoolmates at Chadbroune Elementry School.
Here are a few example sentences we currently support:
Hello, my name is David Johnson and I live in Maine.
My credit card number is 4095-2609-9393-4932 and my crypto wallet id is 16Yeky6GMjeNkAiNcBY7ZhrLoMSgg1BoyZ.
On September 18 I visited microsoft.com and sent an email to test@presidio.site, from the IP 192.168.0.1.
My passport: 191280342 and my phone number: (212) 555-1234.
This is a valid International Bank Account Number: IL150120690000003111111 . Can you please check the status on bank account 954567876544?
Kate's social security number is 078-05-1126. Her driver license? it is 1234567A.
John Smith called Sarah Jane at 321-456-7098 and told her to meet him at 1112 Market Street
During our recent meeting on February 23, 2023, at 10:30 AM, John Doe provided me with his personal details. His email is johndoe@example.com and his contact number is 650-456-7890. He lives in New York City, USA, and belongs to the American nationality with Christian beliefs and a leaning towards the Democratic party. He mentioned that he recently made a transaction using his credit card 4111 1111 1111 1111 and transferred bitcoins to the wallet address 1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa. While discussing his European travels, he noted down his IBAN as GB29 NWBK 6016 1331 9268 19. Additionally, he provided his website as https://johndoeportfolio.com. John also discussed some of his US-specific details. He said his bank account number is 1234567890123456 and his drivers license is Y12345678. His ITIN is 987-65-4321, and he recently renewed his passport, the number for which is 123456789. He emphasized not to share his SSN, which is 669-45-6789. Furthermore, he mentioned that he accesses his work files remotely through the IP 192.168.1.1 and has a medical license number MED-123123456.
items:
[
{'start': 3252, 'end': 3258, 'entity_type': 'US_DRIVER_LICENSE', 'text': '123456', 'operator': 'decrypt'},
{'start': 3248, 'end': 3252, 'entity_type': 'ID', 'text': '-123', 'operator': 'decrypt'},
{'start': 3245, 'end': 3248, 'entity_type': 'ORGANIZATION', 'text': 'MED', 'operator': 'decrypt'},
{'start': 3200, 'end': 3211, 'entity_type': 'IP_ADDRESS', 'text': '192.168.1.1', 'operator': 'decrypt'},
{'start': 3105, 'end': 3116, 'entity_type': 'US_SSN', 'text': '669-45-6789', 'operator': 'decrypt'},
{'start': 3049, 'end': 3058, 'entity_type': 'US_PASSPORT', 'text': '123456789', 'operator': 'decrypt'},
{'start': 2974, 'end': 2985, 'entity_type': 'US_ITIN', 'text': '987-65-4321', 'operator': 'decrypt'},
{'start': 2951, 'end': 2960, 'entity_type': 'US_DRIVER_LICENSE', 'text': 'Y12345678', 'operator': 'decrypt'},
{'start': 2907, 'end': 2923, 'entity_type': 'US_BANK_NUMBER', 'text': '1234567890123456', 'operator': 'decrypt'},
{'start': 2851, 'end': 2853, 'entity_type': 'LOCATION', 'text': 'US', 'operator': 'decrypt'},
{'start': 2819, 'end': 2823, 'entity_type': 'PERSON', 'text': 'John', 'operator': 'decrypt'},
{'start': 2789, 'end': 2817, 'entity_type': 'URL', 'text': 'https://johndoeportfolio.com', 'operator': 'decrypt'},
{'start': 2719, 'end': 2746, 'entity_type': 'IBAN_CODE', 'text': 'GB29 NWBK 6016 1331 9268 19', 'operator': 'decrypt'},
{'start': 2618, 'end': 2652, 'entity_type': 'CRYPTO', 'text': '1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa', 'operator': 'decrypt'},
{'start': 2551, 'end': 2570, 'entity_type': 'CREDIT_CARD', 'text': '4111 1111 1111 1111', 'operator': 'decrypt'},
{'start': 2473, 'end': 2478, 'entity_type': 'ORGANIZATION', 'text': 'party', 'operator': 'decrypt'},
{'start': 2462, 'end': 2472, 'entity_type': 'ORGANIZATION', 'text': 'Democratic', 'operator': 'decrypt'},
{'start': 2392, 'end': 2400, 'entity_type': 'LOCATION', 'text': 'American', 'operator': 'decrypt'},
{'start': 2368, 'end': 2371, 'entity_type': 'LOCATION', 'text': 'USA', 'operator': 'decrypt'},
{'start': 2362, 'end': 2366, 'entity_type': 'LOCATION', 'text': 'City', 'operator': 'decrypt'},
{'start': 2353, 'end': 2361, 'entity_type': 'LOCATION', 'text': 'New York', 'operator': 'decrypt'},
{'start': 2327, 'end': 2339, 'entity_type': 'PHONE_NUMBER', 'text': '650-456-7890', 'operator': 'decrypt'},
{'start': 2281, 'end': 2300, 'entity_type': 'EMAIL_ADDRESS', 'text': 'johndoe@example.com', 'operator': 'decrypt'},
{'start': 2225, 'end': 2228, 'entity_type': 'PERSON', 'text': 'Doe', 'operator': 'decrypt'},
{'start': 2220, 'end': 2224, 'entity_type': 'PERSON', 'text': 'John', 'operator': 'decrypt'},
{'start': 2201, 'end': 2205, 'entity_type': 'DATE_TIME', 'text': '2023', 'operator': 'decrypt'},
{'start': 2188, 'end': 2200, 'entity_type': 'DATE_TIME', 'text': 'February 23,', 'operator': 'decrypt'},
{'start': 2151, 'end': 2157, 'entity_type': 'LOCATION', 'text': 'Street', 'operator': 'decrypt'},
{'start': 2139, 'end': 2150, 'entity_type': 'LOCATION', 'text': '1112 Market', 'operator': 'decrypt'},
{'start': 2098, 'end': 2110, 'entity_type': 'PHONE_NUMBER', 'text': '321-456-7098', 'operator': 'decrypt'},
{'start': 2090, 'end': 2094, 'entity_type': 'PERSON', 'text': 'Jane', 'operator': 'decrypt'},
{'start': 2084, 'end': 2089, 'entity_type': 'PERSON', 'text': 'Sarah', 'operator': 'decrypt'},
{'start': 2071, 'end': 2076, 'entity_type': 'PERSON', 'text': 'Smith', 'operator': 'decrypt'},
{'start': 2066, 'end': 2070, 'entity_type': 'PERSON', 'text': 'John', 'operator': 'decrypt'},
{'start': 2054, 'end': 2062, 'entity_type': 'US_DRIVER_LICENSE', 'text': '1234567A', 'operator': 'decrypt'},
{'start': 2014, 'end': 2025, 'entity_type': 'US_SSN', 'text': '078-05-1126', 'operator': 'decrypt'},
{'start': 1981, 'end': 1985, 'entity_type': 'PERSON', 'text': 'Kate', 'operator': 'decrypt'},
{'start': 1966, 'end': 1978, 'entity_type': 'US_BANK_NUMBER', 'text': '954567876544', 'operator': 'decrypt'},
{'start': 1892, 'end': 1915, 'entity_type': 'IBAN_CODE', 'text': 'IL150120690000003111111', 'operator': 'decrypt'},
{'start': 1824, 'end': 1838, 'entity_type': 'PHONE_NUMBER', 'text': '(212) 555-1234', 'operator': 'decrypt'},
{'start': 1793, 'end': 1802, 'entity_type': 'US_PASSPORT', 'text': '191280342', 'operator': 'decrypt'},
{'start': 1766, 'end': 1777, 'entity_type': 'IP_ADDRESS', 'text': '192.168.0.1', 'operator': 'decrypt'},
{'start': 1733, 'end': 1751, 'entity_type': 'EMAIL_ADDRESS', 'text': 'test@presidio.site', 'operator': 'decrypt'},
{'start': 1698, 'end': 1711, 'entity_type': 'URL', 'text': 'microsoft.com', 'operator': 'decrypt'},
{'start': 1685, 'end': 1687, 'entity_type': 'DATE_TIME', 'text': '18', 'operator': 'decrypt'},
{'start': 1675, 'end': 1684, 'entity_type': 'DATE_TIME', 'text': 'September', 'operator': 'decrypt'},
{'start': 1635, 'end': 1669, 'entity_type': 'CRYPTO', 'text': '16Yeky6GMjeNkAiNcBY7ZhrLoMSgg1BoyZ', 'operator': 'decrypt'},
{'start': 1588, 'end': 1607, 'entity_type': 'CREDIT_CARD', 'text': '4095-2609-9393-4932', 'operator': 'decrypt'},
{'start': 1556, 'end': 1561, 'entity_type': 'LOCATION', 'text': 'Maine', 'operator': 'decrypt'},
{'start': 1534, 'end': 1541, 'entity_type': 'PERSON', 'text': 'Johnson', 'operator': 'decrypt'},
{'start': 1528, 'end': 1533, 'entity_type': 'PERSON', 'text': 'David', 'operator': 'decrypt'},
{'start': 1445, 'end': 1451, 'entity_type': 'ORGANIZATION', 'text': 'School', 'operator': 'decrypt'},
{'start': 1435, 'end': 1444, 'entity_type': 'ORGANIZATION', 'text': 'Elementry', 'operator': 'decrypt'},
{'start': 1424, 'end': 1434, 'entity_type': 'ORGANIZATION', 'text': 'Chadbroune', 'operator': 'decrypt'},
{'start': 1401, 'end': 1404, 'entity_type': 'PERSON', 'text': 'Liu', 'operator': 'decrypt'},
{'start': 1394, 'end': 1400, 'entity_type': 'PERSON', 'text': 'Gordon', 'operator': 'decrypt'},
{'start': 1387, 'end': 1389, 'entity_type': 'PERSON', 'text': 'Ye', 'operator': 'decrypt'},
{'start': 1379, 'end': 1386, 'entity_type': 'PERSON', 'text': 'Zizhong', 'operator': 'decrypt'},
{'start': 1378, 'end': 1379, 'entity_type': 'PERSON', 'text': '\n', 'operator': 'decrypt'},
{'start': 1377, 'end': 1378, 'entity_type': 'PERSON', 'text': '\n', 'operator': 'decrypt'},
{'start': 1371, 'end': 1377, 'entity_type': 'PERSON', 'text': 'Gavin,', 'operator': 'decrypt'},
{'start': 1225, 'end': 1234, 'entity_type': 'LOCATION', 'text': 'Francisco', 'operator': 'decrypt'},
{'start': 1221, 'end': 1224, 'entity_type': 'LOCATION', 'text': 'San', 'operator': 'decrypt'},
{'start': 1195, 'end': 1200, 'entity_type': 'PERSON', 'text': 'Gavin', 'operator': 'decrypt'},
{'start': 1077, 'end': 1081, 'entity_type': 'PERSON', 'text': 'Carl', 'operator': 'decrypt'},
{'start': 709, 'end': 713, 'entity_type': 'PERSON', 'text': 'Carl', 'operator': 'decrypt'},
{'start': 539, 'end': 543, 'entity_type': 'PERSON', 'text': 'Carl', 'operator': 'decrypt'},
{'start': 215, 'end': 220, 'entity_type': 'PERSON', 'text': 'Smith', 'operator': 'decrypt'},
{'start': 121, 'end': 123, 'entity_type': 'AGE', 'text': '31', 'operator': 'decrypt'},
{'start': 110, 'end': 115, 'entity_type': 'PERSON', 'text': 'Smith', 'operator': 'decrypt'},
{'start': 105, 'end': 109, 'entity_type': 'PERSON', 'text': 'Carl', 'operator': 'decrypt'},
{'start': 56, 'end': 67, 'entity_type': 'US_ITIN', 'text': '999-99-9999', 'operator': 'decrypt'},
{'start': 40, 'end': 50, 'entity_type': 'DATE_TIME', 'text': '04/18/1985', 'operator': 'decrypt'},
{'start': 29, 'end': 34, 'entity_type': 'PERSON', 'text': 'Smith', 'operator': 'decrypt'},
{'start': 19, 'end': 28, 'entity_type': 'PERSON', 'text': 'Carl John', 'operator': 'decrypt'},
{'start': 8, 'end': 12, 'entity_type': 'DATE_TIME', 'text': '2023', 'operator': 'decrypt'},
{'start': 1, 'end': 7, 'entity_type': 'DATE_TIME', 'text': 'May 5,', 'operator': 'decrypt'}
]
Result:
Traceback (most recent call last):
File "/home/zzz/workspace/example/pg.py", line 929, in <module>
_test()
File "/home/zzz/workspace/example/pg.py", line 914, in _test
response_j = sanitize_text(j.encode())
File "/home/zzz/workspace/example/pg.py", line 597, in sanitize_text
assert desanitized_results.text == text
AssertionError
To add on to this, I'm running into the following error
File "/home/os/.venv/lib/python3.10/site-packages/presidio_analyzer/analyzer_engine.py", line 189, in analyze
nlp_artifacts = self.nlp_engine.process_text(text, language)
File "/home/os/.venv/lib/python3.10/site-packages/presidio_analyzer/nlp_engine/spacy_nlp_engine.py", line 44, in process_text
doc = self.nlp[language](text)
File "/home/os/.venv/lib/python3.10/site-packages/spacy/language.py", line 1047, in __call__
error_handler(name, proc, [doc], e)
File "/home/os/.venv/lib/python3.10/site-packages/spacy/util.py", line 1724, in raise_error
raise e
File "/home/os/.venv/lib/python3.10/site-packages/spacy/language.py", line 1042, in __call__
doc = proc(doc, **component_cfg.get(name, {})) # type: ignore[call-arg]
File "/home/os/.venv/lib/python3.10/site-packages/presidio_analyzer/nlp_engine/transformers_nlp_engine.py", line 71, in __call__
doc.ents = ents
File "spacy/tokens/doc.pyx", line 796, in spacy.tokens.doc.Doc.ents.__set__
File "spacy/tokens/doc.pyx", line 833, in spacy.tokens.doc.Doc.set_ents
ValueError: [E1010] Unable to set entity information for token 28 which is included in more than one span in entities, blocked, missing or outside.
With the following code sample
import transformers
from huggingface_hub import snapshot_download
from presidio_analyzer import AnalyzerEngine
from presidio_analyzer.nlp_engine import NlpEngineProvider
from presidio_anonymizer import AnonymizerEngine, DeanonymizeEngine
transformers_model = "obi/deid_roberta_i2b2"
snapshot_download(repo_id=transformers_model)
# Instantiate to make sure it's downloaded during installation and not runtime
transformers.AutoTokenizer.from_pretrained(transformers_model)
transformers.AutoModelForTokenClassification.from_pretrained(transformers_model)
# Create configuration containing engine name and models
configuration = {
"nlp_engine_name": "transformers",
"models": [
{
"lang_code": "en",
"model_name": {
"spacy": "en_core_web_sm",
"transformers": transformers_model,
},
}
],
}
# Create NLP engine based on configuration
provider = NlpEngineProvider(nlp_configuration=configuration)
nlp_engine = provider.create_engine()
# Pass the created NLP engine and supported_languages to the AnalyzerEngine
analyzer = AnalyzerEngine(nlp_engine=nlp_engine, supported_languages=["en"])
# Initialize the anonymizer and deanonymizer engines
# Possibly put these into a server to avoid reinitialization
anonymizer = AnonymizerEngine()
deanonymizer = DeanonymizeEngine()
text = """
During our recent meeting on February 23, 2023, at 10:30 AM, John Doe provided
me with his personal details. His email is johndoe@example.com and his contact
number is 650-456-7890. He lives in New York City, USA, and belongs to the
American nationality with Christian beliefs and a leaning towards the Democratic party.
He mentioned that he recently made a transaction using his credit card 4111 1111 1111 1111
and transferred bitcoins to the wallet address 1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa.
While discussing his European travels, he noted down his IBAN as GB29 NWBK 6016 1331 9268 19.
Additionally, he provided his website as https://johndoeportfolio.com. John also discussed some
of his US-specific details. He said his bank account number is 1234567890123456 and his drivers license
is Y12345678. His ITIN is 987-65-4321, and he recently renewed his passport, the number for
which is 123456789. He emphasized not to share his SSN, which is 669-45-6789.
Furthermore, he mentioned that he accesses his work files remotely through the IP 192.168.1.1
and has a medical license number MED-123456.
"""
analysis_results = analyzer.analyze(text=text, language="en")
I believe this should be related
Hi @octaviansima, we are aware of this issue. Until we fix it (WIP), it is recommended to use the TransformersRecognizer approach and not the TransformerNlpEngine
. This should help with your issue, but please let us know if it doesn't.
Hi @zizhong, I did an attempt to reproduce this but wasn't able to. Steps I've taken:
TransformersRecognizer
and configuration using this sample
from presidio_analyzer import AnalyzerEngine, RecognizerRegistry
from presidio_analyzer.nlp_engine import NlpEngineProvider
import spacy
model_path = "obi/deid_roberta_i2b2" supported_entities = BERT_DEID_CONFIGURATION.get( "PRESIDIO_SUPPORTED_ENTITIES") transformers_recognizer = TransformersRecognizer(model_path=model_path, supported_entities=supported_entities)
transformers_recognizer.load_transformer(**BERT_DEID_CONFIGURATION)
registry = RecognizerRegistry() registry.add_recognizer(transformers_recognizer) registry.remove_recognizer("SpacyRecognizer")
if not spacy.util.is_package("en_core_web_sm"): spacy.cli.download("en_core_web_sm")
nlp_configuration = { "nlp_engine_name": "spacy", "models": [{"lang_code": "en", "model_name": "en_core_web_sm"}], }
nlp_engine = NlpEngineProvider(nlp_configuration=nlp_configuration).create_engine()
analyzer = AnalyzerEngine(registry=registry, nlp_engine=nlp_engine) results = analyzer.analyze(text, language="en", return_decision_process=True)
Where text = the text you provided
3. Encrypt:
```python
from presidio_anonymizer import AnonymizerEngine, DeanonymizeEngine
from presidio_anonymizer.entities import RecognizerResult, OperatorResult, OperatorConfig
from presidio_anonymizer.operators import Decrypt
key="16charEncryptKey16charEncryptKey"
engine = AnonymizerEngine()
# Invoke the anonymize function with the text,
# analyzer results (potentially coming from presidio-analyzer)
# and an 'encrypt' operator to get an encrypted anonymization output:
anonymize_result = engine.anonymize(
text=text,
analyzer_results=results,
operators={"DEFAULT": OperatorConfig("encrypt", {"key": key})},
)
# Fetch the anonymized text from the result.
anonymized_text = anonymize_result.text
# Fetch the anonynized entities from the result.
anonymized_entities = anonymize_result.items
# Initialize the engine:
engine = DeanonymizeEngine()
deanonymized_result = engine.deanonymize( text=anonymized_text, entities=anonymized_entities, operators={"DEFAULT": OperatorConfig("decrypt", {"key": key})}, )
deanonymized_result.text
5. Compare: `assert text == deanonymized_result.text`
We had a few contributions to the `presidio-anonymizer` package which aren't released to PyPI yet. It could to be that one of them (like #1092 or #1078) is the source of the difference.
Thanks! The issue is with the code from 🤗 presidio-demo The issue was caused by chunking overlap. I added some check filtering out the overlaps in predications. Now the issue is resolved.
Thanks! if you let us know what the issue was, that would be very helpful!
@omri374 sure thing. https://huggingface.co/spaces/presidio/presidio_demo/blob/main/transformers_rec/transformers_recognizer.py#L267 Here the predications can have overlaps as there is a text_overlap_length for chunking. https://huggingface.co/spaces/presidio/presidio_demo/blob/main/transformers_rec/transformers_recognizer.py#L248
I think that is intended for the use case of only anonymize() used. However it becomes a problem if deanonymize() is applied.
Describe the bug
deanonymize(anonymize(text)) != text
To Reproduce Steps to reproduce the behavior:
obi/deid_roberta_i2b2
as analyzera medical license number MED-123456
a medical license number <ORGANIZATION><ID><US_DRIVER_LICENSE>.
The \<item> is the base64 encoded encrypted item.a medical license number MED-123123456
Expected behavior
deanonymize(anonymize(text)) == text