When I use python multi-process and vaex, I want to save the text as embedding. Everything is normal in the early stage of the program running, but after a while, the saved hdf5 becomes like this:
everything is lost, here is my code:
`import gzip
import hashlib
import json
import logging
import os
import warnings
from multiprocessing import Pool
import numpy as np
import vaex
from sentence_transformers import SentenceTransformer
When I use python multi-process and vaex, I want to save the text as embedding. Everything is normal in the early stage of the program running, but after a while, the saved hdf5 becomes like this:
everything is lost, here is my code:
`import gzip import hashlib import json import logging import os import warnings from multiprocessing import Pool
import numpy as np import vaex from sentence_transformers import SentenceTransformer
warnings.filterwarnings("ignore")
matching_files = ["x1.json.gz", "x2.json.gz", "x3.json.gz", ...]
print("TOTAL # JOBS:", len(matching_files)) print(matching_files)
def save_embedding(file_path): cuda_num = int(file_path.split(".")[0][-4:]) % 8 save_name = file_path.split("/")[-1].split(".")[0] save_path = "xxx"
with Pool(8) as p: p.map(save_embedding, matching_files) `