Open maciejskorski opened 9 months ago
obawiam sie ze tu jest wiecej ciekawych rzeczy :) jak odpalam Twój kod, to pod tym indeksem mam taki tekst:
import pandas as pd
from pathlib import Path
def open_fn(f):
try:
return pd.read_csv(f,engine='python')
except:
return pd.DataFrame()
tweets2 = pd.concat([
pd.concat(map(open_fn, Path(repo_path/'data/futurists_kol/data').rglob('*csv'))),
pd.concat(map(open_fn, Path(repo_path/'data/futurists_rossdawson/data').rglob('*csv')))
])
tweets2.columns = ['index','user','timestamp','url','txt']
tweets2 = tweets2.drop_duplicates(subset=['txt'])
tweets2.reset_index(inplace=True,drop=True)
print(tweets2.loc[3,'txt'])
@sendavidperdue But you supported Trump... and his lies and behavior. Biiiiiiiig mistake. But, nice try..loser
a potencjalne różnice mogą wynikac z preprocessingu - jesli wywalimy tego usera z przodu (a ja wywalam) to wyniki są ciut inne niz gdy go zostawiamy:
But you supported Trump... and his lies and behavior. Biiiiiiiig mistake. But, nice try..loser
[{'label': 'anger', 'score': 0.9814966917037964},
{'label': 'disgust', 'score': 0.9556466341018677},
{'label': 'sadness', 'score': 0.3905079960823059},
{'label': 'pessimism', 'score': 0.07687253504991531},
{'label': 'fear', 'score': 0.029073450714349747},
{'label': 'anticipation', 'score': 0.0288297887891531},
{'label': 'joy', 'score': 0.020227165892720222},
{'label': 'surprise', 'score': 0.016065144911408424},
{'label': 'optimism', 'score': 0.011874457821249962},
{'label': 'trust', 'score': 0.005419053602963686},
{'label': 'love', 'score': 0.005367077421396971}]
vs
@sendavidperdue But you supported Trump... and his lies and behavior. Biiiiiiiig mistake. But, nice try..loser
[{'label': 'anger', 'score': 0.9818358421325684},
{'label': 'disgust', 'score': 0.9528217911720276},
{'label': 'sadness', 'score': 0.2592710554599762},
{'label': 'pessimism', 'score': 0.06370002031326294},
{'label': 'anticipation', 'score': 0.03384344279766083},
{'label': 'fear', 'score': 0.026571445167064667},
{'label': 'joy', 'score': 0.026189832016825676},
{'label': 'surprise', 'score': 0.016805190593004227},
{'label': 'optimism', 'score': 0.01543356291949749},
{'label': 'trust', 'score': 0.005702258553355932},
{'label': 'love', 'score': 0.004971153102815151}]
To porządek scalania plików jest odmienny i zależny od OS. Wymieńmy się porządkiem czytania plików:
files1 = Path('../data/futurists_kol/data').rglob('*csv')
files2 = Path('../data/futurists_rossdawson/data').rglob('*csv')
files = itertools.chain(files1,files2)
with open('account_list.txt','wt') as f:
for fpath in files:
f.write(fpath.name+'\n')
Tu jest mój account_list.txt
When creating the dataset by
and comparing the emotions for the chosen text
we find two different results.
Namely
gives
and from
we obtain