juyalpriyank / scrape_whatsapp

Python script for scraping WhatsApp Web.
4 stars 1 forks source link

Seperation of sender and reciever #1

Open Shell1500 opened 4 years ago

Shell1500 commented 4 years ago

There no way to distinguish between the sender and reciever.

Shell1500 commented 4 years ago

So I kind of found a way to do it myself.

def reload_soup(driver):
    source = driver.page_source
    soup = BeautifulSoup(source, 'html.parser')
    all_soup = soup.find('div', {"id" : "main"})
    soup = BeautifulSoup(str(all_soup), 'html.parser')
    filtered_soup = soup.find('div', {"class" : "copyable-area"})
    filtered_soup = list(filtered_soup)[2]
    soup = BeautifulSoup(str(filtered_soup), 'html.parser')
    final_soup = soup.findAll('div',{"class" : "copyable-text"})

    info = []

    for i in final_soup:
        if i.has_attr('data-pre-plain-text'):
            info.append(i['data-pre-plain-text'])

    dates = []
    names = []

    for i in info:
        i = i.strip()
        date = i.split(']')[0][1:].strip()
        name = i.split(']')[1].strip()
        dates.append(date)
        names.append(name.replace(':', ''))

    final_soup = [text_div.text.replace('\n', ' ') for text_div in final_soup]
    #print(len(dates), len(names), len(final_soup))

    ss = []

    for key, i in enumerate(final_soup):
        if len(names) < len(final_soup):
            names.append(names[(len(names)-1)])
            dates.append(names[(len(dates)-1)])
            print('add')
        print(len(names), len(final_soup))
        ss.append(names[key] + ',' + i + ',' + dates[key] + '\n')

    return ss

This is the modified reload_soup function, It uses the same extracted html to find the name and date of the message, this is slight modification that needs to be done to the print_to_console function, as this modified function outputs a list, however that won't be a major issue.