twindle-co / twindle

Twindle - an open source project for beginners. Converting twitter threads to pdf, epub, and mobi format to be read by Kindle.
https://www.twindle.co
MIT License
134 stars 133 forks source link

Some edge cases and possible fixes which must be tested #768

Closed tr0mbl3y closed 3 years ago

tr0mbl3y commented 3 years ago

some cases that i noted:

  1. in transformation/helper.js file problem: this regex---> /https?:\/\/twitter.com\/[a-zA-Z_]{1,20}\/status\/([0-9]*)/g--> will give empty array when passed with url having username in digits : https://twitter.com/9898abcxyz/status/999999999 possible solution: a): /https?:\/\/twitter.com\/[a-zA-Z_0-9]{1,20}\/status\/([0-9]*)/g-->do this or b) a direct method like regex: /[\d+]{10,}/g can also extract id and cases like this--> https://twitter.com/6/status/999999999 (username with only digit) can also be handled by this.

  2. in this file: Validation/tweet_endpoint.js if the difference of current time and tweet created somehow evaluates to 7 days it might give unnecessary error of tweet older than 7 days problem here : -->return differenceInDays > 7 possible solution: a) return differenceInDays >= 7 b)other thing is: can we use Math.floor here like in many cases i noted that time evaluates to floating numbers so Math.floor operator will make values like6.9~6 but here issue will be if it is somehow 6.9....16times ~ 7 [depends on IDE we are using i guess] it will be evaluate to true : see

console.log(Math.floor(6.999999999999999)); result: 6
console.log(Math.floor(6.9999999999999999)) result: 7 [tested on Mozilla developer IDE] --> i have no idea about this above mentioned result look around and please let the team know what u found

  1. in this filetransformation/rich_rendering.js : this function

    for (let x of mediaKeys) { 
    const mediaInfo = expandedMediaIncludes.find(({ media_key }) => media_key === x);

    ---->should it be {media_keys} ?? please let me know in the comments

  2. in file Scraping/index.js: this function

    const showRepliesButton = [...document.querySelectorAll('div[dir="auto"]')]
      .filter((node) => node.children[0] && node.children[0].tagName === "SPAN")
      .find((node) => node.children[0].innerHTML === "Show replies");
    
    if (showRepliesButton) {
      showRepliesButton.click();
    
      await waitFor(2000);
    }

is essentially searching for show Replies button and clicking it . i am assuming it is searching for Show Reply only one time [correct me if i am wrong] this might be the reason that we are unable to fetch longer thread with 100+ tweets i guess. @Mira-Alf mentioned in issue #728 that she is not receiving full tweets.

possible solution: a loop so that if ShowReply is found multiple times it will keep on clicking and fetch results. as you guys mentioned in the meet.

NOTE: please test these

johnjacobkenny commented 3 years ago

@PuruVJ @Mira-Alf identify the issues, and create separate issues so someone else can pitch in. If nobody steps up in 2 or 3 days, then either of you can take it up

tr0mbl3y commented 3 years ago

Update

issue 1 and 2 are fixed thanx @twindle-co/developer i was completely wrong about point number- 3 it is indeed { media_key } 4 points i am still searching

PuruVJ commented 3 years ago

I think we may fix the 3rd one by clicking using while loop but we'd also need to set a limit of 2 to 3 ( as we simply can't put in more than 100 ids for checking, even when all those IDs may not be of that thread itself.

You know it better @Mira-Alf. What do you think?