qzhu2017 / PyXtal_ml

a Python3 library for ML modeling materials properties
MIT License
11 stars 1 forks source link

[Plan] add Multiprocessing #31

Closed qzhu2017 closed 6 years ago

qzhu2017 commented 6 years ago

In order to speed up the calculation, I think the most important thing is to add Multiprocessing in our descriptor calculation. We need to run the following code in parallel

    def convert_data_1D(self):
        """
        convert the structures to descriptors in the format of 1D array
        """
        start = time()
        print('Calculating the descriptors for {:} materials'.format(len(self.strucs)))
        pbar = ProgressBar()
        feas = []
        for struc in pbar(self.strucs):
            try:
                feas.append(descriptor(struc, self.feature0))
            except:
                feas.append([])
                print('Problem occurs in {}'.format(struc.formula))
        end = time()
        self.time['convert_data'] = end-start

        self.features = feas

The best way is to include it multiple process function. https://docs.python.org/3.6/library/multiprocessing.html

@David-Zagaceta @yanxon Can you look into this?

David-Zagaceta commented 6 years ago

@qzhu2017

I think we can implement multiprocessing using numba, specifically the prange function and precompiling parts of the main.

qzhu2017 commented 6 years ago

@David-Zagaceta The best way is to use multprocess. It is very easy I guess.

If you do it with numba, I guess the most important part is to rewrite the voronoi function, instead of using the function in pymatgen. I don't recommend it.

David-Zagaceta commented 6 years ago

@qzhu2017

Alright. I will use the multiprocessing implementation. I will start work on that tomorrow morning.

qzhu2017 commented 6 years ago

Done in run.py.

runner.convert_data_1D(parallel=2) #choose cpu number if you want to active this function