Optimize `reports.daily()`

hubmapconsortium / py-hubmapbags

Python package used to build the submission for the CFDE Search Portal

1 stars 2 forks source link

Optimize `reports.daily()` #30

Closed icaoberg closed 3 months ago

icaoberg commented 1 year ago

@gesinaphillips pointed out my code is bad :-1:

It can be optimized by making a single call to hubmapbags.apis.get_dataset_info but I will need to find an alternative to parallel_apply (more than likely something like multiprocessing).

icaoberg commented 1 year ago

@fshormin imagine

for index, row in df.iterrows():
   ....

can I process these rows in parallel? Investigate multiprocessing or any other out of the box solutions.

For columns I can use pandarallel.