Open irff opened 9 years ago
There are often three occurences: http://128.199.81.117:5601/#/doc/langgar/langgar/news/?id=24776b50040aac9dfa11bc1f4559c8343b11070f
http://128.199.81.117:5601/#/doc/langgar/langgar/news/?id=2c0afa1806b738f35154024f207de91b32781df5
http://128.199.81.117:5601/#/doc/langgar/langgar/news/?id=ab36174c6b1a08b9bbda38db5d31788abd1d0b36
And four:
http://128.199.81.117:5601/#/doc/langgar/langgar/news?id=84f6cd599d724f2176582054a8cb1c99920a3780&_g=()
http://128.199.81.117:5601/#/doc/langgar/langgar/news?id=0a84ea3d2a402105a24d4e2161e3af0f800b8550&_g=()
http://128.199.81.117:5601/#/doc/langgar/langgar/news?id=7ac241fc9d03d1f850a73cd9c8bc81ab0d4b0c89&_g=()
http://128.199.81.117:5601/#/doc/langgar/langgar/news?id=63080f3edd3a430987111919d7437fbb6e5a2102&_g=()
There are many duplicate articles from antaranews
for example: http://128.199.81.117:5601/#/doc/langgar/langgar/news?id=d0ca76853cfdfe8ad36d6c96f039d2344745f0d0&_g=()
and
http://128.199.81.117:5601/#/doc/langgar/langgar/news?id=d0ca76853cfdfe8ad36d6c96f039d2344745f0d0&_g=()
the article source are the same, but it can't handle the difference in URL postfix:
http://www.antaranews.com/berita/495849/komisi-vi-dpr-akan-pertanyakan-peningkatan-listrik-kepada-angkasa-pura-ii?utm_campaign=news&utm_medium=populer&utm_source=populer_home
http://www.antaranews.com/berita/495849/komisi-vi-dpr-akan-pertanyakan-peningkatan-listrik-kepada-angkasa-pura-ii?utm_campaign=news&utm_medium=related&utm_source=fly
I think its best to ignore / trim the postfix string after the character '?' in the URL.