tevgeniou / FoundationsML

MIT License
12 stars 5 forks source link

Startup Data collection - HELP needed #3

Open minh-vo opened 5 years ago

minh-vo commented 5 years ago

PROJECT CONTEXT: A research partner agrees to let me collect monthly data of 150 companies in their investment portfolio over 18-24 months, for the following categories.

  1. Product development (what new feature? What can it do?)
  2. Social Capital (who did you get help? how did you get help? what results?)
  3. Human Capital (who did you hire? How did you get them? What for?) [so on and forth for another 5 categories]

In each category, the company will describe

I will also have initial 'endowment' data on the companies (e.g initial founding team, business model elements, etc.)

I NEED FEEDBACK ON THE FOLLOWING: I think that an ML algorithm can do for me these analyses...

...and the physics of the data is the following,...

...and therefore I consider appropriate and relevant the algorithms developed in medical science journals that predict disease events based on patient's past (sparse, irregular, incomplete) medical histories

Am I on the right path? How would you have done differently? What am I not thinking about? Thanks for help!

gricer01 commented 5 years ago

Hey Minh what's the research question or objective of the analysis you've described? If I understand correctly you will have high-dimensional data for 150 units, and you're wanting to cluster them into groups with similar company histories/initial conditions or make inferences about particular events in their company histories using the rest of the sample. I'm not sure how feasible this will be for a small sample and many dimensions, but maybe matrix completion methods are also worth looking into.

minh-vo commented 5 years ago

hi Richard,

Thanks for your idea.

You're correct- the sample size is sadly small. While I have interest in creating an algorithm that may predict failures and success of a more generalizable sample, the current sample may only produce ~150 outcomes, with the expected failure outcome to be 85% (normal startup failure rate). Left with 15% on successes, it would still be hard to associate what event histories (prior to startup success of course) are predictive of success. Still thinking my head around this. One way out is possibly to define multiple 'risky events/actions' (e.g co-founder turnover, long duration of flat revenue, etc.), for which one startup can may have a high number of outcomes (rather than a single one - dead or scaled-up).

How did matrix completion method to come mind, just curious?