haganbt / PYLON-exporter

Utility for exporting data from a PYLON index
4 stars 2 forks source link

Tableau workbook returning volumes that are too high #50

Closed samaybar closed 9 years ago

samaybar commented 9 years ago

The bubble chart in the tableau workbook (on Share of Voice, URLs and Domains, and Topic Exploration workbooks) appears to be showing volumes that are much higher than the underlying volumes image

key,interactions,unique_authors "Republican Party",403300,333500 "Climate change",291900,239900 "Barack Obama",101600,88800 "Donald Trump",85700,79000 "Jeb Bush",74300,68700 "Dr. Ben Carson",72400,66800 "Senator Marco Rubio",71700,66100 "Ronald Reagan",71100,66300 "Liberal",70900,65400 "Senator Ted Cruz",70800,64800

scosden commented 9 years ago

Hey Sam,

Just out of curiosity, did you use the Politics index in CS_3, or a different political index for this analysis?

On Tue, Sep 1, 2015 at 12:00 PM, samaybar notifications@github.com wrote:

The bubble chart in the tableau workbook (on Share of Voice, URLs and Domains, and Topic Exploration workbooks) appears to be showing volumes that are much higher than the underlying volumes [image: image] https://cloud.githubusercontent.com/assets/8261091/9609290/f18d9eb4-50a0-11e5-9c63-a7e9772b9b08.png

key,interactions,unique_authors "Republican Party",403300,333500 "Climate change",291900,239900 "Barack Obama",101600,88800 "Donald Trump",85700,79000 "Jeb Bush",74300,68700 "Dr. Ben Carson",72400,66800 "Senator Marco Rubio",71700,66100 "Ronald Reagan",71100,66300 "Liberal",70900,65400 "Senator Ted Cruz",70800,64800

— Reply to this email directly or view it on GitHub https://github.com/datasift/SE-PYLON-exporter/issues/50.

Scott Cosden Sales Engineer | DataSift

e: scott.cosden@datasift.com p: (347) 404-1995 t: @scottcosden https://twitter.com/scottcosden Learn more about DataSift https://datasift.com/

This email contains confidential information and is for the exclusive use of the addressee/s. If you are not the addressee, then any distribution, copying or use of this email is prohibited. If received in error, please advise the sender and delete it immediately.

DataSift, Inc | Office: DataSift, 157 Columbus Avenue, Suite 503, New York, NY 10023

scosden commented 9 years ago

On Tue, Sep 1, 2015 at 1:05 PM, Scott Cosden scott.cosden@datasift.com wrote:

Hey Sam,

Just out of curiosity, did you use the Politics index in CS_3, or a different political index for this analysis?

On Tue, Sep 1, 2015 at 12:00 PM, samaybar notifications@github.com wrote:

The bubble chart in the tableau workbook (on Share of Voice, URLs and Domains, and Topic Exploration workbooks) appears to be showing volumes that are much higher than the underlying volumes [image: image] https://cloud.githubusercontent.com/assets/8261091/9609290/f18d9eb4-50a0-11e5-9c63-a7e9772b9b08.png

key,interactions,unique_authors "Republican Party",403300,333500 "Climate change",291900,239900 "Barack Obama",101600,88800 "Donald Trump",85700,79000 "Jeb Bush",74300,68700 "Dr. Ben Carson",72400,66800 "Senator Marco Rubio",71700,66100 "Ronald Reagan",71100,66300 "Liberal",70900,65400 "Senator Ted Cruz",70800,64800

— Reply to this email directly or view it on GitHub https://github.com/datasift/SE-PYLON-exporter/issues/50.

Scott Cosden Sales Engineer | DataSift

e: scott.cosden@datasift.com p: (347) 404-1995 t: @scottcosden https://twitter.com/scottcosden Learn more about DataSift https://datasift.com/

This email contains confidential information and is for the exclusive use of the addressee/s. If you are not the addressee, then any distribution, copying or use of this email is prohibited. If received in error, please advise the sender and delete it immediately.

DataSift, Inc | Office: DataSift, 157 Columbus Avenue, Suite 503, New York, NY 10023

Scott Cosden Sales Engineer | DataSift

e: scott.cosden@datasift.com p: (347) 404-1995 t: @scottcosden https://twitter.com/scottcosden Learn more about DataSift https://datasift.com/

This email contains confidential information and is for the exclusive use of the addressee/s. If you are not the addressee, then any distribution, copying or use of this email is prohibited. If received in error, please advise the sender and delete it immediately.

DataSift, Inc | Office: DataSift, 157 Columbus Avenue, Suite 503, New York, NY 10023

samaybar commented 9 years ago

It was a climate change index that ran during Republican debate. On Sep 1, 2015 1:05 PM, "scosden" notifications@github.com wrote:

Hey Sam,

Just out of curiosity, did you use the Politics index in CS_3, or a different political index for this analysis?

On Tue, Sep 1, 2015 at 12:00 PM, samaybar notifications@github.com wrote:

The bubble chart in the tableau workbook (on Share of Voice, URLs and Domains, and Topic Exploration workbooks) appears to be showing volumes that are much higher than the underlying volumes [image: image] < https://cloud.githubusercontent.com/assets/8261091/9609290/f18d9eb4-50a0-11e5-9c63-a7e9772b9b08.png

key,interactions,unique_authors "Republican Party",403300,333500 "Climate change",291900,239900 "Barack Obama",101600,88800 "Donald Trump",85700,79000 "Jeb Bush",74300,68700 "Dr. Ben Carson",72400,66800 "Senator Marco Rubio",71700,66100 "Ronald Reagan",71100,66300 "Liberal",70900,65400 "Senator Ted Cruz",70800,64800

— Reply to this email directly or view it on GitHub https://github.com/datasift/SE-PYLON-exporter/issues/50.

Scott Cosden Sales Engineer | DataSift

e: scott.cosden@datasift.com p: (347) 404-1995 t: @scottcosden https://twitter.com/scottcosden Learn more about DataSift https://datasift.com/

This email contains confidential information and is for the exclusive use of the addressee/s. If you are not the addressee, then any distribution, copying or use of this email is prohibited. If received in error, please advise the sender and delete it immediately.

DataSift, Inc | Office: DataSift, 157 Columbus Avenue, Suite 503, New York, NY 10023

— Reply to this email directly or view it on GitHub https://github.com/datasift/SE-PYLON-exporter/issues/50#issuecomment-136797839 .

samaybar commented 9 years ago

@haganbt I think the issue is in the Entity Volumes worksheet -- because you have joined tables and are using the sum value for each key, you count the key multiple times. I think using max instead of Sum, as showing in attached screen shot, fixes the issue, but I'm not familiar enough with what you've done with the data to know if that would create other issues.

screenshot 2015-09-02 09 24 58

haganbt commented 9 years ago

Thanks @samaybar for the pointers. I will investigate.

samaybar commented 9 years ago

I confirmed that it is multiplying for each category's unique author value times the number of links associated with that category as it is using the standard_tableau_linksbyentity joined with entity_volumes.

You can also use Average or Minimum in place of Sum and you get the same (correct) results as Max

haganbt commented 9 years ago

Fixed. @samaybar @scosden @deannomuraatdatasift