tracking-exposed / dashboard

dashboard is the toolkit for data scientist who uses tracking.exposed services
https://facebook.tracking.exposed
5 stars 5 forks source link

DataFrame columns mismatch between commits in FB_topics.ipynb #19

Closed 0rC0 closed 5 years ago

0rC0 commented 5 years ago

In commit 4e1448115bdaa1be6611fce5b9e3c850fe5eb267 FB_topics.ipynb the pandas.DataFrame columns referred in code are not anymore present, probably they where in the not publicy available electiondata_topics.csv referred in commit ab179b15756fecddfcb174182ede1104ce2d7794. https://github.com/tracking-exposed/dashboard/blob/ab179b15756fecddfcb174182ede1104ce2d7794/FB_topics.ipynb#L116

For example, in https://github.com/tracking-exposed/dashboard/blob/4e1448115bdaa1be6611fce5b9e3c850fe5eb267/FB_topics.ipynb#L119 probably from eu19 dataset there is no column concatenatedText https://github.com/tracking-exposed/dashboard/blob/4e1448115bdaa1be6611fce5b9e3c850fe5eb267/FB_topics.ipynb#L144

There are also other issues, i.e. the language in user_a.csv is Spanish and not English: https://github.com/tracking-exposed/dashboard/blob/4e1448115bdaa1be6611fce5b9e3c850fe5eb267/FB_topics.ipynb#L78

berli0z commented 5 years ago

i'm going to address the issue, by making a script that generates a csv with the same characteristics of the data-set you have used, and revert the format to the original one

could you please describe the exact format (column names and types) associate with the dataset you used?

also, sorry for messing up :/

0rC0 commented 5 years ago

The code in the notebook referes to the file in the first commit. Maybe @forloopkilla (the author of the first commit) has it?

I can try anyway in the next days to adapt the code in the notebook to the format in the published dataset

berli0z commented 5 years ago

Also @vecna might help on clarifying what the format of the dataset was? @0rC0 I'm afraid the format of the published dataset would need some modification to work (for example if it needs to tokenize the texts, they cant be a list and that's why they're concatenated).

forloopkilla commented 5 years ago

In commit 4e14481 FB_topics.ipynb the pandas.DataFrame columns referred in code are not anymore present, probably they where in the not publicy available electiondata_topics.csv referred in commit ab179b1. https://github.com/tracking-exposed/dashboard/blob/ab179b15756fecddfcb174182ede1104ce2d7794/FB_topics.ipynb#L116

For example, in

https://github.com/tracking-exposed/dashboard/blob/4e1448115bdaa1be6611fce5b9e3c850fe5eb267/FB_topics.ipynb#L119

probably from eu19 dataset there is no column concatenatedText https://github.com/tracking-exposed/dashboard/blob/4e1448115bdaa1be6611fce5b9e3c850fe5eb267/FB_topics.ipynb#L144

There are also other issues, i.e. the language in user_a.csv is Spanish and not English: https://github.com/tracking-exposed/dashboard/blob/4e1448115bdaa1be6611fce5b9e3c850fe5eb267/FB_topics.ipynb#L78

Hey guys, sorry for the late reply. The data I have subsetted is all in English. I can upload the data 'codes/fb/electiondata_topics.csv' if you guys find that useful. Due to some privacy issues, is it best to email you the data or upload it here?

berli0z commented 5 years ago

Hey guys, sorry for the late reply. The data I have subsetted is all in English. I can upload the data 'codes/fb/electiondata_topics.csv' if you guys find that useful. Due to some privacy issues, is it best to email you the data or upload it here?

@forloopkilla I think you could just paste the column names here and i will figure how how to convert the current csv ouputs to one that is readable by your script :)

vecna commented 5 years ago

correct, the contatenatexText was something unique made for the datathon. The text it is not concatenated, normally, but given as a list (https://eu19.tracking.exposed/page/api/ look at texts returned from summary)

vecna commented 5 years ago

@forloopkilla

Hey guys, sorry for the late reply. The data I have subsetted is all in English. I can upload the data 'codes/fb/electiondata_topics.csv' if you guys find that useful. Due to some privacy issues, is it best to email you the data or upload it here?

hey, no! you can't share that dataset (you should have deleted it after the datathon) and the solution, is to align the code to the actual format

forloopkilla commented 5 years ago

Hey guys, sorry for the late reply. The data I have subsetted is all in English. I can upload the data 'codes/fb/electiondata_topics.csv' if you guys find that useful. Due to some privacy issues, is it best to email you the data or upload it here?

@forloopkilla I think you could just paste the column names here and i will figure how how to convert the current csv ouputs to one that is readable by your script :)

COLUMN NAMES: ['ANGRY', 'HAHA', 'LIKE', 'LOVE', 'SAD', 'WOW', 'displaySource', 'fblinktype', 'id', 'images.count', 'impressionOrder', 'impressionTime', 'nature', 'permaLink', 'postId', 'publicationTime', 'source', 'sourceLink', 'timeline', 'user', 'concatenatedText', 'concatLanguage']

0rC0 commented 5 years ago

@0rC0 I'm afraid the format of the published dataset would need some modification to work (for example if it needs to tokenize the texts, they cant be a list and that's why they're concatenated).

Is this ConcatenatedText something like ' '.join(texts) or do I miss something?

I wanted to put the "hands on" the code :P and I'm trying to play with the DataFrame columns to make columns in the old format. If it can interest someone: https://github.com/0rC0/dashboard/commit/a66ed302566c09b40fe8c76705324cec64bf169f

vecna commented 5 years ago

@0rC0 yes, is that.

berli0z commented 5 years ago

merged! thanks!