graph-genome / component_segmentation

Read in ODGI Bin output and identify co-linear components
Apache License 2.0
3 stars 4 forks source link

parallel json parsing #33

Closed dimatr closed 4 years ago

dimatr commented 4 years ago

On a 2.2 GB test this drops the JSONparser time from 3 min to 30 sec

josiahseaman commented 4 years ago

The issue appears to be that wheels are only built for Linux and MacOS, but I'm testing on a Windows machine. The lack of support makes sense really, but it should at least be noted this would be the first departure of Pantograph from being cross-platform compatible. Would you mind writing an OS switch and local import instead?

https://ray.readthedocs.io/en/latest/installation.html

lomereiter commented 4 years ago

Also consider using an OS-independent library (joblib?)

subwaystation commented 4 years ago

Do we really need support for Windows? There is no Windows support for e.g. odgi, too. We will have our docker pipeline, so as long as docker is available, the whole thing will work out.

dimatr commented 4 years ago

Please check now. The current change works on Linux and Windows

subwaystation commented 4 years ago

Reading in the data on a 28 core machine is much faster now! Thanks @dimatr .

Just one minor thing: How about a parameter where users can specify the number of cores to use?

dimatr commented 4 years ago

I will add one new parameter --parallel-cores with default os.cpu_count()

dimatr commented 4 years ago

should be ready now, please check