rdicosmo / parmap

Parmap is a minimalistic library allowing to exploit multicore architecture for OCaml programs with minimal modifications.
http://rdicosmo.github.io/parmap/
Other
92 stars 20 forks source link

do you want parallel file operations? #58

Closed UnixJunkie closed 6 years ago

UnixJunkie commented 7 years ago

I have this one currently:

let parmap_on_file (ncores: int) (fn: string) (f: 'a -> 'b) (read_one: in_channel -> 'a): 'b list = ...
UnixJunkie commented 7 years ago

Or, I wonder if I should create a separate library depending on parmap ...

UnixJunkie commented 7 years ago

I will create a separate library if I gather enough interesting primitives.

rdicosmo commented 7 years ago

If the new functions just use the parmap library and do not require modifications to the parmap code, you can definitely create a separate library on top of parmap. If you find functions that are general enough that inclusion in parmap seems the best way to go, just create another PR and we can discuss that.

2017-02-24 10:35 GMT-08:00 Francois BERENGER notifications@github.com:

I will create a separate library if I gather enough interesting primitives.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rdicosmo/parmap/issues/58#issuecomment-282368331, or mute the thread https://github.com/notifications/unsubscribe-auth/AAp-vxRSNzHFAyI5QMm7kNdktlrox4KZks5rfyLggaJpZM4MLhzt .

-- Roberto Di Cosmo


Professeur (on leave at/detache a INRIA) IRIF email : roberto@dicosmo.org Universite Paris Diderot web : http://www.dicosmo.org Case 7014 Twitter : http://twitter.com/rdicosmo 5, Rue Thomas Mann F-75205 Paris Cedex 13 FRANCE

Office location:

Paris Diderot INRIA

Bureau 3020 (3rd floor) Bureau C123 Batiment Sophie Germain Batiment C 8 place Aurélie Nemours 2, Rue Simone Iff Tel: +33 1 80 49 44 42

Metro Bibliotheque F. Mitterrand Ligne 6: Dugommier ligne 14/RER C Ligne 14/RER A: Gare de Lyon

GPG fingerprint 2931 20CE 3A5A 5390 98EC 8BFC FCCA C3BE 39CB 12D3

UnixJunkie commented 7 years ago

I think such a function is quite useful. I'd like to contribute it to parmap. Here is the current signature:

let parmap_on_file
    (ncores: int)
    (fn: filename)
    (f: 'a -> 'b)
    (read_one: in_channel -> 'a): 'b list

If deemed useful, we can probably add later friend functions such as pariter_on_file, parmap_fold_on_file, etc.

Let me know if you have a better interface to propose.

This is the second time I need such a functionality in a project, so I guess it can be quite useful to other parmap users as well. I do chemoinformatics, but I guess bioinformatics people might have such needs as well.

Regards, Francois.

UnixJunkie commented 7 years ago

@smondet @agarwal

agarwal commented 7 years ago

I haven't been using parmap in a while, so my opinion not useful at this time.

smondet commented 7 years ago

@UnixJunkie that sounds useful when we want to have only ncores items at once in memory. A more general version would use any stream-like input: unit -> 'a option.

PS: I haven't done any "analysis-level" bioinformatics in a long while though :)

UnixJunkie commented 7 years ago

@smondet Is the option just used to send the end of file info via a None?

UnixJunkie commented 7 years ago

Maybe the most generic construct is: let parallelize (ncores: int) (demux: () -> 'a) (work: 'a -> 'b) (mux: 'b -> ()): () but then that's so generic that it should reside out of parmap.

smondet commented 7 years ago

@UnixJunkie Yes, "End of Stream" actually :+1:

UnixJunkie commented 6 years ago

parany can be used for that