kgoel59 / BigData

0 stars 1 forks source link

A2 Task 1: Design a big data analytics project by following Big Data Analytics Lifecycle. #6

Open kgoel59 opened 1 week ago

kgoel59 commented 1 week ago

Assignment 2 aims to find misinformation on social network, i.e., identify profiles that are mistakenly recorded as human/non-human profiles

kgoel59 commented 1 week ago

Big Data Lifecycle

  1. Discovery

Learn Business Domain: Understand the industry and context. Interview Sponsor & Identify Stakeholders: Engage with key individuals to gather insights. Define Resources & Goals: Establish objectives and available resources. Identify Potential Data Sources: Locate relevant data sources. Frame the Problem & Develop Initial Hypotheses: Formulate hypotheses, including Null Hypothesis (H0) and Alternative Hypothesis (HA or H1).

  1. Data Preparation

Prepare Sandbox: Set up an environment for data preparation. Perform ETLT (Extract, Transform, Load, Transform): Process data for analysis. Understand Data Details: Examine the data’s structure and quality. Data Conditioning: Address issues like missing values and outliers. Format Data: Prepare data for analysis. Visualize Data: Use plots to explore data patterns.

  1. Model Planning

Select Variables: Based on relationships (e.g., correlation matrix) and domain knowledge. Identify Candidate Models: Refer to hypotheses, translate into machine learning models, review literature, and document assumptions.

  1. Model Building

Create Datasets: Prepare training, validation, and testing datasets. Train and Test Models: Evaluate model performance.

  1. Communicating Results

Compare Results: Assess against criteria. Articulate Findings: Clearly present results. Discuss Limitations & Recommendations: Provide insights on limitations and suggest improvements.

  1. Operationalize

Deliverables: Finalize and deliver the project. Pilot Project: Test the model in a real-world scenario. Performance & Constraints: Monitor and address any constraints. Training: Educate new users as needed.

kgoel59 commented 1 week ago

1. Learn Business Domain


2. Define Resources & Goals


3. Frame the Problem & Develop Initial Hypotheses


4. Data Preparation - @kgoel59


5. Visualize Data


6. Model Building


7. Training and Testing


8. Deliverables