tattle-made / Uli

Software and Resources for Mitigating Online Gender Based Violence in India
https://uli.tattle.co.in
GNU General Public License v3.0
40 stars 30 forks source link

Rewrite Existing Parser #179

Closed dennyabrain closed 10 months ago

dennyabrain commented 2 years ago

We have a parser that goes through DOM data and extracts structured content out of it. twee_text, author, timestamp etc. It can be seen here - https://github.com/tattle-made/OGBV/blob/main/browser-extension/plugin/src/twitter/parser.js

The current way to specify path for different components is very unwieldy and hard to debug 👍🏽

const TWEET_PATH_CLICKED = new RegExp(
        'DIV\\(0\\):DIV\\(0\\):DIV\\(0\\):DIV\\(0\\):ARTICLE\\(0\\):DIV\\(0\\):DIV\\(2\\):DIV\\(1\\):DIV\\(0\\):DIV\\(0\\):DIV\\(0\\):DIV\\(0\\):SPAN'
    );

Lets come up with a cleaner implementation for this and add tests.

tarunima commented 1 year ago

@dennyabrain can we close this?

dennyabrain commented 1 year ago

@Bhargav-Dave i am closing this since its out of scope for now..lets reopen or reference this when we work on fixing the hyperlink issue on twitter.

Bhargav-Dave commented 1 year ago

Noted, that makes sense, we can create the relevant issues with pointed information about problems with the new parser