\u200f is a non-printing right-to-left unicode character. It is in the original data but is not visible when rendered. We didn't have an issue with this before but with the new airflow process, this character isn't getting encoded. The issue is likely in intake. We need this character to render properly or we need to remove it. I'm not sure the implications of removing it but here is a way to do that:
Some Arabic strings have an unencoded character in them: e.g.
\u200f
is a non-printing right-to-left unicode character. It is in the original data but is not visible when rendered. We didn't have an issue with this before but with the new airflow process, this character isn't getting encoded. The issue is likely in intake. We need this character to render properly or we need to remove it. I'm not sure the implications of removing it but here is a way to do that:https://stackoverflow.com/questions/46897952/remove-right-to-left-character-u200f-in-python-hebrew