Closed berli0z closed 5 years ago
This has higher priority due to the upcoming eu19 initiative
this could be outputted as either hourly, daily (for time series) or total aggregate (for pie-charts or so)
Thanks for the progress, four questions :)
* It is unique by postId this aggregation, correct?
Exactly, no postId is being repeated in the dataset. sometimes the same post, if changed slightly, has different postId. that stays, further aggregation in that sense could be done
* reactions I'm afraid can be misleading, they make sense only if associated to the time of acquisition?
i get the reactions at the time of acquisition in fbtrex. they can be a bit misleading, but by getting the last value collected by the fbcrawler (which is always collected at 12.00 the day after), we can somehow see the reactions outcome (not the development in time)
* the text is it part of the CSV, why strip it?
we can also keep the text. the way i am trying to design the datasets is "as clean as possible". there is little change to visualize texts as they are now, unless we are making a "summary explorer" which is another thing i have in mind. if we want to visualize texts we can build a separate dataset which would be cleaner and faster to process in a visualization, optimized for that.. that was my guess. less is more when preparing datasets for visualization, having something more just because it's there is not the most popular approach (as far as i've read around)
* how can you plot hourly? as far as i remember, the time returned by fbcrawl is only by day.
fbcrawl returns the values until a specific day, but every post in the dataset has a datetime value (basically publicationTime)
Furthermore, in order to prepare to use this tool, we should include in the readme that you need two folders, one with all the fbtrex summary a and one with the fbcrawl sources outputs as csv files