this issue proposes a storyboard to help attendees understand where they might find data and some items they may need to consider to use it for their GenAI MVP. They'll need to consider data sources, acquiring and transforming the data in a way that supports training goals, and identifying what kind of ML models are required to obtain their desired outcomes.
Assumption: data lives in different databases across the firm and will need to be identified, consolidated, cleaned reshaped to suit training purposes.
Talking points:
identify sources of data in the firm and build partnerships with project/data owners
get safe (e.g. read only) access to the data (follow compliance/PII rules, avoid impacting operations)
make sure you have enough data... need some for training, some for testing
cleaning it is the hard part... can be hacked initially, but expect to spend time developing a production quality process later
proposed problem statement:
As a team member doing an MVP for the job recommender application
I need to identify a high level workflow to compare a set of skills from a person to a set of skills attached to a open roles in HR and identify the 5 closest candidates
so i can ensure the best alignment of skills and roles inside the firm
This can take several forms
initiated by the candidate looking for their next career challenge in the firm
triggered by reorg activities shifting valuable resources from one part of the firm to another part
triggered by innovation cycles looking to pull hidden valuable skills to participate in new projects that require "new" skills (e.g. web v.01 drew from print/layout talent pool)
one possible approach
data is pulled from external sample database(s) and consolidated in a staging database (move sample data from GCP Postgres / CockroachDB Cloud to BigQuery)
feed data from bigquery to Vertex to create embeddings and store in vector database
query from Vector database to identify unexpected matches
this issue proposes a storyboard to help attendees understand where they might find data and some items they may need to consider to use it for their GenAI MVP. They'll need to consider data sources, acquiring and transforming the data in a way that supports training goals, and identifying what kind of ML models are required to obtain their desired outcomes.
Assumption: data lives in different databases across the firm and will need to be identified, consolidated, cleaned reshaped to suit training purposes.
Talking points:
proposed problem statement: As a team member doing an MVP for the job recommender application I need to identify a high level workflow to compare a set of skills from a person to a set of skills attached to a open roles in HR and identify the 5 closest candidates so i can ensure the best alignment of skills and roles inside the firm
This can take several forms
one possible approach
Resources: