Closed murrayds closed 4 years ago
So if you're at IU Kokomo, you'll only appear as "IU UNIV system"? I think the second option makes the most sense. I think we should at least do some robustness analysis at the minimum.
So if you're at IU Kokomo, you'll only appear as "IU UNIV system"?
Correct—smaller regional universities are just disambiguated into the "UNIV system" category.
We can try re-embedding after implementing option 2 to see if anything fundamentally changes. I doubt much will change, simply because these pairs of organizations account for a small number of total pairs.
This issue has been satisfied by pull request #60. Precedence rules are now standard in our data,
There are many major organizations that we classify as having 100% mobility, even when they consist of many thousands of individuals. For example Indiana University, University of Michigan, UC Berkely.
The reason is that individuals who affiliate with IU Bloomington are 100% classified as also classified as affiliated with the IU UNIV SYSTEM. However, not every IU UNIV SYSTEM individual is classified as affiliated with IU Bloomington. For example, someone from IU Kokomo would not be marked as affiliated with IU Bloomington. Similar issues are also obvious for French and maybe Italian organizations.
For the embeddings, this theoretically shouldn't cause any major issues—thanks to negative sampling, common co-occurrences will appear less often in the training set. However, this does lead to confusion when displaying descriptive statistics because it over-represents organizational mobility. Given this, I propose the following solutions for the descriptive results
@yy and @jisungyoon , thoughts?