minimaxir / facebook-page-post-scraper

Data scraper for Facebook Pages, and also code accompanying the blog post How to Scrape Data From Facebook Page Posts for Statistical Analysis
2.12k stars 663 forks source link

New gTLDs , IDNs, EAI and Linkification issues #128

Open UA2018 opened 4 years ago

UA2018 commented 4 years ago

Many top-level domains (TLD), such as .technology, family, .gay are missing from Facebook's library.

We come across universal acceptance issues when trying to create an account with Unicode emails (email address internationalization), such as 测试5@普遍接受-测试.世界.

Facebook also does not recognize the New gTLDs when typed in the post. It converts the .com domains to clickable hyperlinks, whereas it does not create a link for example.technology.

More and more people from non-Latin language speaking countries are using Facebook. Yet, the new gTLDs, Internationalized Domain Names (IDNs) are not correctly recognized in Facebook.

You may update the TLD listing or automate the update by drawing the content from the DNS root zone or from https://www.iana.org/domains/root/db

Email address internationalization check can be done from https://uasg.tech/eai-check/

Quick guide for linkification: https://uasg.tech/wp-content/uploads/documents/UASG010-en-digital.pdf

Regards

SpangleLabs commented 4 years ago

This github project is not an official facebook project, and as such, cannot fix this issue on facebook

UA2018 commented 4 years ago

Can you please kindly recommend which facebook github page is suitable for raising this issue? Because it looks like twitter can fix it throug github, (example: https://github.com/twitter/twitter-text/blob/master/conformance/tld_lib.yml)