poorna-kumar / MSE231-Project

Final project
1 stars 0 forks source link

General Comments/Tracking #1

Open jessica-writes-code opened 7 years ago

jessica-writes-code commented 7 years ago

Gender Analysis:

Literature:

shazad commented 7 years ago

Literature:

jessica-writes-code commented 7 years ago

Word2Vec

Possible Uses of Word2Vec

Heistations Re: Word2Vec

Other Stuff: Word Count

Industries

poorna-kumar commented 7 years ago

Minutes of meeting 11/10:

shazad commented 7 years ago

Topics: Art Business Crime Elections Entertainment Fashion Politics Religion Science Sports Technology Weather Other

jessica-writes-code commented 7 years ago

@poorna-kumar

Hi Poorna -

Are the gender lists for each year basically stable? I was going to go ahead and start making author-gender-specific word vectors.

poorna-kumar commented 7 years ago

Hi Jessica,

So I think they are stable, except in the case of multiple writers, because those cases are not being handled correctly. I'm not sure, however, that we should be creating author-gender word vectors, primarily because different authors write about different topics, so the effect of the particular topic that is being covered will be very pronounced, and the results will be more telling of the topic being covered than the author. It might be hard to find authors of different genders who write about the exact same thing, so it might be hard to compare our results meaningfully. Aggregation seems preferable to me, the way you've already done.

On Nov 27, 2016 2:55 PM, "Jessica" notifications@github.com wrote:

@poorna-kumar https://github.com/poorna-kumar

Hi Poorna -

Are the gender lists for each year basically stable? I was going to go ahead and start making author-gender-specific word vectors.

  • Jessica

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/poorna-kumar/MSE231-Project/issues/1#issuecomment-263154071, or mute the thread https://github.com/notifications/unsubscribe-auth/ASd_bzM3egawyyAkgWmpSBBiAlm4258Zks5rCgp3gaJpZM4KeWVh .

jessica-writes-code commented 7 years ago

That seems reasonable to me. I suppose I didn't realize the extent to which different genders of author cover different topics.

I'll remove/comment out the author-gender analysis.

On Mon, Nov 28, 2016 at 9:48 AM, poorna-kumar notifications@github.com wrote:

Hi Jessica,

So I think they are stable, except in the case of multiple writers, because those cases are not being handled correctly. I'm not sure, however, that we should be creating author-gender word vectors, primarily because different authors write about different topics, so the effect of the particular topic that is being covered will be very pronounced, and the results will be more telling of the topic being covered than the author. It might be hard to find authors of different genders who write about the exact same thing, so it might be hard to compare our results meaningfully. Aggregation seems preferable to me, the way you've already done.

On Nov 27, 2016 2:55 PM, "Jessica" notifications@github.com wrote:

@poorna-kumar https://github.com/poorna-kumar

Hi Poorna -

Are the gender lists for each year basically stable? I was going to go ahead and start making author-gender-specific word vectors.

  • Jessica

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/poorna-kumar/MSE231-Project/issues/1# issuecomment-263154071, or mute the thread https://github.com/notifications/unsubscribe-auth/ASd_ bzM3egawyyAkgWmpSBBiAlm4258Zks5rCgp3gaJpZM4KeWVh .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/poorna-kumar/MSE231-Project/issues/1#issuecomment-263341484, or mute the thread https://github.com/notifications/unsubscribe-auth/AKKw-nsBGpQnbaPYET7a229FnyI0mw_Nks5rCxPwgaJpZM4KeWVh .

poorna-kumar commented 7 years ago

Hmm, it's possible I didn't understand what you meant. We can conference in class. I thought by author-gender, you meant some cross product of author and gender. If you're picking all articles written by male/female authors, and finding word vectors for those, though, it should be okay, according to me.

On Nov 28, 2016 9:50 AM, "Jessica" notifications@github.com wrote:

That seems reasonable to me. I suppose I didn't realize the extent to which different genders of author cover different topics.

I'll remove/comment out the author-gender analysis.

On Mon, Nov 28, 2016 at 9:48 AM, poorna-kumar notifications@github.com wrote:

Hi Jessica,

So I think they are stable, except in the case of multiple writers, because those cases are not being handled correctly. I'm not sure, however, that we should be creating author-gender word vectors, primarily because different authors write about different topics, so the effect of the particular topic that is being covered will be very pronounced, and the results will be more telling of the topic being covered than the author. It might be hard to find authors of different genders who write about the exact same thing, so it might be hard to compare our results meaningfully. Aggregation seems preferable to me, the way you've already done.

On Nov 27, 2016 2:55 PM, "Jessica" notifications@github.com wrote:

@poorna-kumar https://github.com/poorna-kumar

Hi Poorna -

Are the gender lists for each year basically stable? I was going to go ahead and start making author-gender-specific word vectors.

  • Jessica

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/poorna-kumar/MSE231-Project/issues/1# issuecomment-263154071, or mute the thread https://github.com/notifications/unsubscribe-auth/ASd_ bzM3egawyyAkgWmpSBBiAlm4258Zks5rCgp3gaJpZM4KeWVh .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/poorna-kumar/MSE231-Project/issues/1# issuecomment-263341484, or mute the thread https://github.com/notifications/unsubscribe-auth/AKKw- nsBGpQnbaPYET7a229FnyI0mw_Nks5rCxPwgaJpZM4KeWVh .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/poorna-kumar/MSE231-Project/issues/1#issuecomment-263341970, or mute the thread https://github.com/notifications/unsubscribe-auth/ASd_b5QVKEOFBVvkicW6bwtwmDabKuMjks5rCxRagaJpZM4KeWVh .

jessica-writes-code commented 7 years ago

Ah. No. Sorry. I obviously was not sufficiently clear.

I meant that I was taking articles from all male authors (in a year) and creating one set of word vectors and then taking articles from all female authors (in a year) and creating another set of word vectors.

As you said, we can conference in class. But I think that we're on the same page.

On Mon, Nov 28, 2016 at 10:15 AM, poorna-kumar notifications@github.com wrote:

Hmm, it's possible I didn't understand what you meant. We can conference in class. I thought by author-gender, you meant some cross product of author and gender. If you're picking all articles written by male/female authors, and finding word vectors for those, though, it should be okay, according to me.

On Nov 28, 2016 9:50 AM, "Jessica" notifications@github.com wrote:

That seems reasonable to me. I suppose I didn't realize the extent to which different genders of author cover different topics.

I'll remove/comment out the author-gender analysis.

On Mon, Nov 28, 2016 at 9:48 AM, poorna-kumar notifications@github.com wrote:

Hi Jessica,

So I think they are stable, except in the case of multiple writers, because those cases are not being handled correctly. I'm not sure, however, that we should be creating author-gender word vectors, primarily because different authors write about different topics, so the effect of the particular topic that is being covered will be very pronounced, and the results will be more telling of the topic being covered than the author. It might be hard to find authors of different genders who write about the exact same thing, so it might be hard to compare our results meaningfully. Aggregation seems preferable to me, the way you've already done.

On Nov 27, 2016 2:55 PM, "Jessica" notifications@github.com wrote:

@poorna-kumar https://github.com/poorna-kumar

Hi Poorna -

Are the gender lists for each year basically stable? I was going to go ahead and start making author-gender-specific word vectors.

  • Jessica

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/poorna-kumar/MSE231-Project/issues/1# issuecomment-263154071, or mute the thread https://github.com/notifications/unsubscribe-auth/ASd_ bzM3egawyyAkgWmpSBBiAlm4258Zks5rCgp3gaJpZM4KeWVh .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/poorna-kumar/MSE231-Project/issues/1# issuecomment-263341484, or mute the thread https://github.com/notifications/unsubscribe-auth/AKKw- nsBGpQnbaPYET7a229FnyI0mw_Nks5rCxPwgaJpZM4KeWVh .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/poorna-kumar/MSE231-Project/issues/1# issuecomment-263341970, or mute the thread https://github.com/notifications/unsubscribe-auth/ASd_ b5QVKEOFBVvkicW6bwtwmDabKuMjks5rCxRagaJpZM4KeWVh .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/poorna-kumar/MSE231-Project/issues/1#issuecomment-263348556, or mute the thread https://github.com/notifications/unsubscribe-auth/AKKw-rAApYpdOmorvrbhjhIaGK03qRYPks5rCxougaJpZM4KeWVh .

jessica-writes-code commented 7 years ago

@all

Hey all -

I did a (reasonably brief) search for historical female employment statistics. Based on what I found, it seems that our best bet is something like this: http://www.bls.gov/cps/aa2006/aat9.txt, which is from the BLS. I think the idea would be to collect data for years where it's present and then map the BLS occupation categories to ours. That should be a reasonably straight-forward process (albeit a bit time-consuming and labor-intensive).

The major issues with this are:

Thoughts? Jessica