Overall: Scripts for processing large datasets have been added.
The adding and updating parts include:
in the "main.py": add new CLI command: "learn_split", "gen_cluster", "infer_projects"
in the "vectorize.py": update the generating datapoints function in batches
add "learn_split.py" for training the model separately
add "gen_cluster.py" for generating clusters based on the model separately
also add new functions for dataset_loading in "data_loadeds.py"
Overall: Scripts for processing large datasets have been added. The adding and updating parts include: in the "main.py": add new CLI command: "learn_split", "gen_cluster", "infer_projects" in the "vectorize.py": update the generating datapoints function in batches add "learn_split.py" for training the model separately add "gen_cluster.py" for generating clusters based on the model separately also add new functions for dataset_loading in "data_loadeds.py"